Privacy

Casual commuter conversations on light rail trains have an unexpected eavesdropper — NJ Transit. Video and audio surveillance systems designed to make riders more secure are also recording the conversations of light rail passengers at all times. NJ Transit officials say the on-board cameras and audio surveillance systems are needed to fight crime and maintain security.

The investments appear to reflect the CIA's increasing focus on monitoring social media. Last September, David Cohen, the CIA's second-highest ranking official, spoke at length at Cornell University about a litany of challenges stemming from the new media landscape. The Islamic State's "sophisticated use of Twitter and other social media platforms is a perfect example of the malign use of these technologies," he said.

Tech

Working with this data has been challenging for many different reasons. The first reason is, it’s huge—we’re talking about 2.6TB. The second reason is that it didn’t all come at the same time; we didn’t receive a 2.6TB hard drive. We had to deal with incremental information, and we also had to deal with a lot of images. The majority of the files are emails and database files. There are also a lot of PDFs and TIFFs, so we have to do a lot of OCR-ing for millions of documents.

So first, most of the leak was unstructured data. Second, it was not easy working with the structured data. The Mossack Fonseca internal database didn’t come to us in the raw, original format, unfortunately. We had to do reverse-engineering to reconstruct the database, and connect the dots based on codes that the documents had.

Visualizations

There is no comment system. If you want to tell me something about this article, you can do so via e-mail or Mastodon.