Data Links #135

Privacy

Testing Americans’ Tolerance for Surveillance.

In essence, surveillance, broadly defined, is when we’re intentionally watched, monitored or tracked by a third party—usually for their own purposes. This sounds terrifying, and many people would be quick to say so. In theory Americans value our privacy. In practice, we’ll readily cede it for select reasons: safety being the first, convenience being the second.
How Do You Vote? 50 Million Google Images Give a Clue. I have the feeling I've read something like this in the past, but anyway, it's a good summary. The bottomline is that, as more and more information that was previously difficult to obtain becomes ubiquitous, more and more invasive applications will surface. You can choose not to buy home-spy device, but you can't choose not to park outside (unless, like yours truly, you don't have a driving license, but then we face a completely different set of issues.)

What vehicle is most strongly associated with Republican voting districts? Extended-cab pickup trucks. For Democratic districts? Sedans.

Those conclusions may not be particularly surprising. After all, market researchers and political analysts have studied such things for decades.

But what is surprising is how researchers working on an ambitious project based at Stanford University reached those conclusions: by analyzing 50 million images and location data from Google Street View, the street-scene feature of the online giant’s mapping service.

For the first time, helped by recent advances in artificial intelligence, researchers are able to analyze large quantities of images, pulling out data that can be sorted and mined to predict things like income, political leanings and buying habits. In the Stanford study, computers collected details about cars in the millions of images it processed, including makes and models.

Tech

AI System Sorts News Articles By Whether or Not They Contain Actual Information.

In a recent paper published in the Journal of Artificial Intelligence Research, computer scientists Ani Nenkova and Yinfei Yang, of Google and the University of Pennsylvania, respectively, describe a new machine learning approach to classifying written journalism according to a formalized idea of “content density.” With an average accuracy of around 80 percent, their system was able to accurately classify news stories across a wide range of domains, spanning from international relations and business to sports and science journalism, when evaluated against a ground truth dataset of already correctly classified news articles.
How an A.I. ‘Cat-and-Mouse Game’ Generates Believable Fake Photos.

The woman in the photo seems familiar.

She looks like Jennifer Aniston, the “Friends” actress, or Selena Gomez, the child star turned pop singer. But not exactly.

She appears to be a celebrity, one of the beautiful people photographed outside a movie premiere or an awards show. And yet, you cannot quite place her.

That’s because she’s not real. She was created by a machine.
AI in drug discovery is overhyped: examples from AstraZeneca, Harvard, Stanford and Insilico Medicine. Please be aware that the author has skin in the game.

In this craze, lots of pharma/biotech companies and investors wonder whether they should jump on the bandwagon in 2018, or wait and see.

In this post, I argue that they must be careful, because pretty often, AI researchers overhype their achievements, to say the least. This practice is widespread, and for illustration purposes, I looked at recent research from one big Pharma, AstraZeneca, two universities, Harvard and Stanford, and one startup, Insilico Medicine. These labs are quite reputable, they are producing otherwise interesting research.
A.I. and Big Data Could Power a New War on Poverty. Hacker News discussion here. Wars against concepts are scary. Technological solutions to social problems too, for different reasons. Technological wars on social problems are a whole new thing.

Poverty, of course, is a multifaceted phenomenon. But the condition of poverty often entails one or more of these realities: a lack of income (joblessness); a lack of preparedness (education); and a dependency on government services (welfare). A.I. can address all three.

Visualizations

Perceptions of Probability and Numbers.

Data Links is a periodic blog post published on Sundays (specific time may vary) which contains interesting links about data science, machine learning and related topics. You can subscribe to it using the general blog RSS feed or this one, which only contains these articles, if you are not interested in other things I might publish.

Have you read an article you liked and would you like to suggest it for the next issue? Just contact me!

There is no comment system. If you want to tell me something about this article, you can do so via e-mail or Mastodon.