Data Links #156

Please note: there's a chance there won't be a Data Links post next weekend, as I have some family in town and will be busy in that large room with blue ceiling that other people call "outside". Fear not, I'll recover the regular schedule by the end of the month.

How-to

Categorizing Listing Photos at Airbnb.
An Introduction to Recurrent Neural Networks.
Mortality in Puerto Rico after Hurricane Maria. The paper, the code.

Privacy

Facebook photo-scanning lawsuit could cost it billions.

A federal judge ruled Monday that millions of the social network’s users can proceed as a group with claims that its photo-scanning technology violated an Illinois law by gathering and storing biometric data without their consent. [...] Facebook has for years encouraged users to tag people in photographs they upload in their personal posts and the social network stores the collected information. The company has used a program it calls DeepFace to match other photos of a person.
Vermont passes first law to crack down on data brokers.

Data brokers in Vermont will now have to register as such with the state; they must take standard security measures and notify authorities of security breaches (no, they weren’t before); and using their data for criminal purposes like fraud is now its own actionable offense.
A trip to the ER with your phone may mean injury lawyer ads for weeks.

Law firms are using geofencing in hospital emergency rooms to target advertisements to patients’ mobile devices as they seek medical care, according to Philadelphia public radio station WHYY. Geofencing can essentially create a digital perimeter around certain locations and target location-aware devices within the borders of those locations. Patients who unwittingly jump that digital fence may see targeted ads for more than a month, and on multiple devices, the outlet notes.
Materials from the Workshop on Data and Algorithmic Transparency (DAT'16).
I discovered Cracked Labs the other day and have been reading their reports and books. They're thorough research pieces about corporate uses of our data, and should be read by everybody that wants to have a precise picture of how data brokers work.

[PDF] The Princeton Web Transparency and Accountability Project.

When you browse the web, hidden “third parties” collect a large amount of data about your behavior. This data feeds algorithms to target ads to you, tailor your news recommendations, and sometimes vary prices of online products. The network of trackers comprises hundreds of entities, but consumers have little awareness of its pervasiveness and sophistication
Improve Your Privacy in the Age of Mass Surveillance.

today we’ll reclaim our privacy and improve browsing experience step-by-step. There is a difference between protecting your grandma sharing cake recipes, and a human rights activists in a hostile country. Your granny might not be the right person to sell a prepaid SIM & burner-phone to. An activist might consider the below steps entry-level basics, even dangerous if not tailored to the individual. But we all need protection. Even more so if you assume that «you got nothing to hide».

Tech

After the privacy section, let's keep grounded on dystopia just a bit more to appreciate this series of posts that analyse ways in which Black Mirror can become reality. Data analysis / machine learning plays a big part in that: I, II, III, IV, V.
AI winter is well on its way. Hacker News discussion.

Visibly the sentiment has quite considerably declined, there are much fewer tweets praising deep learning as the ultimate algorithm, the papers are becoming less "revolutionary" and much more "evolutionary". Deepmind hasn't shown anything breathtaking since their Alpha Go zero [and even that wasn't that exciting, given the obscene amount of compute necessary and applicability to games only - see Moravec's paradox]. OpenAI was rather quiet, with their last media outburst being the Dota 2 playing agent [which I suppose was meant to create as much buzz as Alpha Go, but fizzled out rather quickly]. In fact articles began showing up that even Google in fact does not know what to do with Deepmind, as their results are apparently not as practical as originally expected... As for the prominent researchers, they've been generally touring around meeting with government officials in Canada or France to secure their future grants, Yann Lecun even stepped down (rather symbolically) from the Head of Research to Chief AI scientist at Facebook. This gradual shift from rich, big corporations to government sponsored institutes suggests to me that the interest in this kind of research within these corporations (I think of Google and Facebook) is actually slowly winding down. Again these are all early signs, nothing spoken out loud, just the body language.
Thousands of AI researchers are boycotting the new Nature journal. Hacker News discussion. Less to do with machine learning that with the traditional publisher business model.

Machine learning is a young and technologically astute field. It does not have the historical traditions of other fields and its academics have seen no need for the closed-access publishing model. The community itself created, collated, and reviewed the research it carried out. We used the internet to create new journals that were freely available and made no charge to authors. The era of subscriptions and leatherbound volumes seemed to be behind us.

The public already pays taxes that fund our research. Why should people have to pay again to read the results? Colleagues in less well-funded universities also benefit. Makerere University in Kampala, Uganda, has as much access to the leading machine-learning research as Harvard or MIT. The ability to pay no longer determines the ability to play.
Google Started a Political Shitstorm Because of Its Over-Reliance on Wikipedia. Take results from a third party, treat them as The Truth and place them next to your own. What can go wrong?
Artificial intelligence footstep recognition system could be used for airport security.

Researchers at The University of Manchester in collaboration with the Universidad Autónoma de Madrid, Spain, have developed a state-of-the-art artificial intelligence (AI), biometric verification system that can measure a human’s individual gait or walking pattern. It can successfully verify an individual simply by them walking on a pressure pad in the floor and analysing the footstep 3D and time-based data.

Visualizations

Fraction of total videogame sales per genre.
A bit apart from what I normally post here, but this is gold. NSA propaganda / motivational posters from the 50s and 60s.

Data Links is a periodic blog post published on Sundays (specific time may vary) which contains interesting links about data science, machine learning and related topics. You can subscribe to it using the general blog RSS feed or this one, which only contains these articles, if you are not interested in other things I might publish.

Have you read an article you liked and would like to suggest it for the next issue? Just contact me!

There is no comment system. If you want to tell me something about this article, you can do so via e-mail or Mastodon.