New data analysis competitions

How-to

Privacy

  • Invisible Manipulation: 10 ways our data is being used against us. A good summary by the people of Privacy International.

    The era where we were in control of the data on our own computers has been replaced with devices containing sensors we cannot control, storing data we cannot access, in operating systems we cannot monitor, in environments where our rights are rendered meaningless. Soon the default will shift from us interacting directly with our devices to interacting with devices we have no control over and no knowledge that we are generating data. Below we outline 10 ways in which this exploitation and manipulation is already happening.

  • China testing facial-recognition surveillance system in Xinjiang – report. Every article I read about this is scarier than the previous one. In this one...

    Chinese surveillance chiefs are testing a facial-recognition system that alerts authorities when targets stray more than 300 metres from their home or workplace, as part of a surveillance push that critics say has transformed the country’s western fringes into a high-tech police state.

  • Finding Your Voice Forget About Siri and Alexa — When It Comes to Voice Identification, the “NSA Reigns Supreme”.

    At the height of the Cold War, during the winter of 1980, FBI agents recorded a phone call in which a man arranged a secret meeting with the Soviet embassy in Washington, D.C. On the day of his appointment, however, agents were unable to catch sight of the man entering the embassy. At the time, they had no way to put a name to the caller from just the sound of his voice, so the spy remained anonymous. Over the next five years, he sold details about several secret U.S. programs to the USSR.

    It wasn’t until 1985 that the FBI, thanks to intelligence provided by a Russian defector, was able to establish the caller as Ronald Pelton, a former analyst at the National Security Agency. The next year, Pelton was convicted of espionage.

    Today, FBI and NSA agents would have identified Pelton within seconds of his first call to the Soviets. A classified NSA memo from January 2006 describes NSA analysts using a “technology that identifies people by the sound of their voices” to successfully match old audio files of Pelton to one another. “Had such technologies been available twenty years ago,” the memo stated, “early detection and apprehension could have been possible, reducing the considerable damage Pelton did to national security.”

Tech

  • Deep Empathy. The results are not always good, but the concept is novel and interesting.

    Deep Empathy utilizes deep learning to learn the characteristics of Syrian neighborhoods affected by conflict, and then simulates how cities around the world would look in the midst of a similar conflict. Can this approach -- familiar in a range of artistic applications -- help us to see recognizable elements of our lives through the lens of those experiencing vastly different circumstances, theoretically a world away? And by helping an AI learn empathy, can this AI teach us to care?

  • Alibaba neural network defeats human in global reading test.

    Alibaba says its deep neural network model has outscored humans in a global reading test, paving the way for the underlying technology to reduce the need for human input.

    Next: have an AI read a movie script and tell the writer where it doesn't make sense. It's sorely needed.

  • The accuracy, fairness, and limits of predicting recidivism.

    Algorithms for predicting recidivism are commonly used to assess a criminal defendant’s likelihood of committing a crime. These predictions are used in pretrial, parole, and sentencing decisions. Proponents of these systems argue that big data and advanced machine learning make these analyses more accurate and less biased than humans. We show, however, that the widely used commercial risk assessment software COMPAS is no more accurate or fair than predictions made by people with little or no criminal justice expertise. We further show that a simple linear predictor provided with only two features is nearly equivalent to COMPAS with its 137 features.

Visualizations


Data Links is a periodic blog post published on Sundays (specific time may vary) which contains interesting links about data science, machine learning and related topics. You can subscribe to it using the general blog RSS feed or this one, which only contains these articles, if you are not interested in other things I might publish.

Have you read an article you liked and would you like to suggest it for the next issue? Just contact me!