Data Links #72

New data analysis competitions

On Kaggle: Outbrain Click Prediction.

Outbrain pairs relevant content with curious readers in about 250 billion personalized recommendations every month across many thousands of sites. In this competition, Kagglers are challenged to predict which pieces of content its global base of users are likely to click on.

$12,000 for the top place.

Privacy

Yahoo secretly scanned customer emails for US intelligence-sources.

Yahoo Inc last year secretly built a custom software program to search all of its customers' incoming emails for specific information provided by U.S. intelligence officials, according to people familiar with the matter.

The company complied with a classified U.S. government directive, scanning hundreds of millions of Yahoo Mail accounts at the behest of the National Security Agency or FBI, said two former employees and a third person apprised of the events.

Hacker News discussion.

[PDF] Networks of Control: A Report on Corporate Surveillance, Digital Tracking, Big Data & Privacy.

Tech

CIA 'Siren Servers' can predict social uprisings 3-5 days in advance.

The CIA claims to be able to predict social unrest days before it happens thanks to powerful super computers dubbed Siren Servers by the father of Virtual Reality, Jaron Lanier.

Though crime happens everywhere, predictive policing tools send cops to poor/black neighborhoods.

The reason that Predpol predicted all the crime would occur in a poor black neighborhood in Oakland is that Oakland's notoriously racist police force concentrates its policing there, and you can only find crime in places where you look for it. Predpol and tools like it are sold as data-driven ways to overcome this kind of police bias, but really, they're just ways of giving bias a veneer of objective responsibility.

Is Your Big Data Project a "Weapon of Math Destruction"?.

For those of us who make a living solving problems, the current deluge of big data might seem like a wonderland. Data scientists and programmers can now draw on reams of human data—and apply them—in ways that would have been unthinkable only a decade ago.

But amid all the excitement, we're beginning to see hints that our nice, tidy algorithms and predictive models might be prone to the same shortcomings that the humans who create them are. Take, for example, the revelation that Google disproportionately served ads for high-paying jobs to men rather than women. And there's the troubling recent discovery that a criminal risk assessment score disproportionately flagged many African Americans as higher risk, sometimes resulting in longer prison sentences.

A bot crawled thousands of studies looking for simple math errors. The results are concerning.

But she [Michèle Nuijten, a PhD student at Tilburg University in the Netherlands] and some colleagues in the Netherlands were curious enough to check. They built a computer program that could quickly scan published psychological papers and check the math on the statistics. They called their program "Statcheck" and ran it on 30,717 papers.

Rounding errors, and other small potential mistakes in calculating the statistics, were rampant.

There is no comment system. If you want to tell me something about this article, you can do so via e-mail or Mastodon.