• The Plan to Test Cities' Sewage for Drugs Is a New Form of Mass Surveillance.

    Across the globe, researchers at wastewater treatment plants are testing for psychoactive substances passed by drug users through their feces and urine. The data can be incredibly valuable, letting scientists and law enforcement quickly track drug use trends and identify new substances on the market. It can also measure the impact of drug policy strategies, even highlight which days of the week drug use spikes [...]

  • One More Time With Feeling: 'Anonymized' User Data Not Really Anonymous.

    As companies and governments increasingly hoover up our personal data, a common refrain to keep people from worrying is the claim that nothing can go wrong -- because the data itself is "anonymized" -- or stripped of personal detail. But time and time again, we've noted how this really is cold comfort; given it takes only a little effort to pretty quickly identify a person based on access to other data sets. As cellular carriers in particular begin to collect every shred of browsing and location data, identifying "anonymized" data using just a little additional context has become arguably trivial.

  • Doug Henwood's Behind the news had a nice interview, on January 19th, with Yasha Levine on the subject of encryption and privacy. You can find it here.


  • AI Could Transform the Science of Counting Crowds. But, who would lie in the press conference then?

    The Trump administration's controversial attempt to declare its recent presidential inauguration as having "the largest audience to witness an inauguration, period," has inadvertently highlighted the fact that counting crowds remains a painstaking and inexact science. But the rise of artificial intelligence could soon spare crowd scientists the task of manually counting heads.

    Incidentally, I was part of a citizen journalism project in Spain back in 2005: El Manifestómetro. We were doing independent demonstration head-counts just to prove that the media could do it themselves instead of relying in second-hand numbers. Then the media started interviewing us. Sigh.

  • Facilitating the discovery of public datasets.

    Our ultimate goal is to help foster an ecosystem for publishing, consuming and discovering datasets. As such, this ecosystem would include data publishers, aggregators (in the form of large data repositories that provide additional value by cleaning and reconciling metadata), search engines that enable data discovery of the data, and, most important, data consumers.

  • Deep learning algorithm does as well as dermatologists in identifying skin cancer. (HackerNews discussion here).

    Diagnosing skin cancer begins with a visual examination. A dermatologist usually looks at the suspicious lesion with the naked eye and with the aid of a dermatoscope, which is a handheld microscope that provides low-level magnification of the skin. If these methods are inconclusive or lead the dermatologist to believe the lesion is cancerous, a biopsy is the next step.

    Bringing this algorithm into the examination process follows a trend in computing that combines visual processing with deep learning, a type of artificial intelligence modeled after neural networks in the brain. Deep learning has a decades-long history in computer science but it only recently has been applied to visual processing tasks, with great success. The essence of machine learning, including deep learning, is that a computer is trained to figure out a problem rather than having the answers programmed into it.

  • BBC Radio 4 program: Controlling the unaccountable algorithm. Nothing in here will be new to readers of Weapons of math destruction, but it's an excellent summary otherwise.

  • This comes directly from liberationtech. How to use sentiment analysis against the market.

    Public Dissentiment is an online tool that helps protesters negatively impact the price of a publicly traded company’s stock by communicating with algorithmic market makers. By using the same algorithmic sentiment analysis techniques as financial trading bots, the app generates posts for social media that link to news stories that will be viewed negatively by algorithmic market makers.

Data Links is a periodic blog post published on Sundays (specific time may vary) which contains interesting links about data science, machine learning and related topics. You can subscribe to it using the general blog RSS feed or this one, which only contains these articles, if you are not interested in other things I might publish.

Have you read an article you liked and would you like to suggest it for the next issue? Just contact me!