New data analysis competitions


  • Big Data's Unexplored Frontier: Recorded Music.

    While still a vast field, a huge part of machine learning exists for what may seem to be a relatively narrow subset of problems. These are problems involving visual processing: character recognition, facial recognition, the generation of trippy images dominated by populations of dogslugs, birdlegs, and spidereyes.

    This isn't accidental. Image data is unique in its suitability for machine learning tasks. It naturally occurs as multidimensional arrays—tensors, really—of pixel data. It's more at the fringes of machine learning that audio data gets a turn. Part of the problem is that, despite the vast amounts of digital audio data that exists in the world, there is a relative lack of openly accessible computational datasets. There's pretty much just one, actually: the Million Song Dataset, which offers some 280 GB of feature data extracted from 1 million audio tracks. Musicology remains largely old-school.

  • Meet the New AI Challenging Human Poker Pros.

    In 2015, several of the world's top poker players faced down a supercomputer-powered artificial intelligence named Claudico during a grueling 80,000 hands of no-limit Texas Hold'em. Beginning tomorrow [note: meaning Jan 11th], a rematch of humans versus AI will test whether humanity can hold its own against an even more capable challenger.

  • Meet the New AI Challenging Human Poker Pros.

    Face2Gene takes advantage of the fact that so many genetic conditions have a tell-tale "face"—a unique constellation of features that can provide clues to a potential diagnosis. It is just one of several new technologies taking advantage of how quickly modern computers can analyze, sort, and find patterns across huge reams of data.

  • A bit far from what I normaly put here, but this Data Scientist position sounds amazingly interesting.

    This project seeks to employ techniques from computational social science and digital geography in order to achieve two primary objectives. First, we seek to use scrape relevant geographic information out datasets of darknet sales: building some re-usable data at the country level (e.g. 'the number of heroin sellers based in every country', 'the total number of vendors selling weapons in every country' etc.). Second, we seek to map, visualise, and analyse those data: using multiple variables (the various categories of products and services) to ask 'what is the geography of illicit products and services?' We hope to further ask 'does the online trade of illicit goods and services have a significantly different geography from conventional mappings of that trade?' In other words, can data from the darknet be used as a proxy for more traditional (and much harder to collect) data about illegal economic activities?


Data Links is a periodic blog post published on Sundays (specific time may vary) which contains interesting links about data science, machine learning and related topics. You can subscribe to it using the general blog RSS feed or this one, which only contains these articles, if you are not interested in other things I might publish.

Have you read an article you liked and would you like to suggest it for the next issue? Just contact me!