How-to

  • Python implementations of several machine learning algorithms. Github repository.

  • [PDF, academic paper] Predicting antimicrobial drug consumption using web search data.

  • Related to the previous one: sometimes it seems very easy to obtain certain kind of data. Then no one bothers to double-check. Caveat Emptor, Computational Social Science: Large-Scale Missing Data in a Widely-Published Reddit Corpus.

    As researchers use computational methods to study complex social behaviors at scale, the validity of this computational social science depends on the integrity of the data. On July 2, 2015, Jason Baumgartner published a dataset advertised to include "every publicly available Reddit comment" which was quickly shared on Bittorrent and the Internet Archive. This data quickly became the basis of many academic papers on topics including machine learning, social behavior, politics, breaking news, and hate speech. We have discovered substantial gaps and limitations in this dataset which may contribute to bias in the findings of that research. In this paper, we document the dataset, substantial missing observations in the dataset, and the risks to research validity from those gaps.

  • More from the messy, hard, and not always well-compensated world of data journalism: Measuring the Toll of the Opioid Epidemic Is Tougher Than It Seems.

    One of our editors set out to create an ambitious list of data sources on the opioid epidemic. Much of what he found was out of date, and some data contradicted other data.

Privacy

Tech

  • Deep-learning system generates specific genre-based music.

    Izaro Goienetxea, a UPV/EHU researcher, has developed a method for automatically generating new tunes on the basis of a collection or corpus comprising tunes used in bertso—a form of extempore, sung, Basque verse-making. She has also presented a new way of representing pieces of music, and developed a new method for automatically classifying music. PLOS ONE has reported on the research conducted in the UPV/EHU's Robotics and Autonomous Systems research group.

    PLOS ONE paper right here.

  • Making music using new sounds generated with machine learning.

    Technology has always played a role in inspiring musicians in new and creative ways. The guitar amp gave rock musicians a new palette of sounds to play with in the form of feedback and distortion. And the sounds generated by synths helped shape the sound of electronic music. But what about new technologies like machine learning models and algorithms? How might they play a role in creating new tools and possibilities for a musician’s creative process? Magenta, a research project within Google, is currently exploring answers to these questions.

  • Artificial Intelligence and the Attack/Defense Balance.

    Artificial intelligence technologies have the potential to upend the longstanding advantage that attack has over defense on the Internet. This has to do with the relative strengths and weaknesses of people and computers, how those all interplay in Internet security, and where AI technologies might change things.

    You can divide Internet security tasks into two sets: what humans do well and what computers do well. Traditionally, computers excel at speed, scale, and scope. They can launch attacks in milliseconds and infect millions of computers. They can scan computer code to look for particular kinds of vulnerabilities, and data packets to identify particular kinds of attacks.

  • Pentagon Wants Silicon Valley’s Help on A.I..

    There is little doubt that the Defense Department needs help from Silicon Valley’s biggest companies as it pursues work on artificial intelligence. The question is whether the people who work at those companies are willing to cooperate.

    Related: prove that you are not an Evil corporate person.

  • Microsoft reaches a historic milestone, using AI to match human performance in translating news from Chinese to English.

    A team of Microsoft researchers said Wednesday that they believe they have created the first machine translation system that can translate sentences of news articles from Chinese to English with the same quality and accuracy as a person.

  • China to bar people with bad 'social credit' from planes, trains. Hacker News discussion.

    China said it will begin applying its so-called social credit system to flights and trains and stop people who have committed misdeeds from taking such transport for up to a year.

Visualizations


Data Links is a periodic blog post published on Sundays (specific time may vary) which contains interesting links about data science, machine learning and related topics. You can subscribe to it using the general blog RSS feed or this one, which only contains these articles, if you are not interested in other things I might publish.

Have you read an article you liked and would you like to suggest it for the next issue? Just contact me!