New data analysis competitions


  • Reading this news article I found the GDELT project.

    Supported by Google Jigsaw, the GDELT Project monitors the world's broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the people, locations, organizations, counts, themes, sources, emotions, counts, quotes, images and events driving our global society every second of every day, creating a free open platform for computing on the entire world.

    Please beware: autoplay video.

  • Some good discussion on Hacker News about extracting data from Wikipedia.


  • Facebook Failed to Protect 30 Million Users From Having Their Data Harvested by Trump Campaign Affiliate.

    The Intercept interviewed five individuals familiar with Kogan's work for SCL. All declined to be identified, citing concerns about an ongoing inquiry at Cambridge and fears of possible litigation. Two sources familiar with the SCL project told The Intercept that Kogan had arranged for more than 100,000 people to complete the Facebook survey and download an app. A third source with direct knowledge of the project said that Global Science Research obtained data from 185,000 survey participants as well as their Facebook friends. The source said that this group of 185,000 was recruited through a data company, not Mechanical Turk, and that it yielded 30 million usable profiles. No one in this larger group of 30 million knew that "likes" and demographic data from their Facebook profiles were being harvested by political operatives hired to influence American voters.

    As with the fake news issue, I still think this is a completely different problem, one that essentially boild down to cultural differences: whether privacy has any real value in today's common social norms. If people treated data uploaded to Facebook (or anywhere in the cloud, for that matter, unless it's properly encrypted) as essentially public and free-for-all, things would be different.


  • Call for proposals: LMAO.

    Where is the humour in data? In these meme-fuelled, data-overloaded, statistically skewed times, we are asking how, when and where can data be funny.

  • The AI Misinformation Epidemic.

    This pairing of interest with ignorance has created a perfect storm for a misinformation epidemic. The outsize demand for stories about AI has created a tremendous opportunity for impostors to capture some piece of this market.

    The author responds to comments to this article here.

Data Links is a periodic blog post published on Sundays (specific time may vary) which contains interesting links about data science, machine learning and related topics. You can subscribe to it using the general blog RSS feed or this one, which only contains these articles, if you are not interested in other things I might publish.

Have you read an article you liked and would you like to suggest it for the next issue? Just contact me!