(Note to readers: I'll be on holidays for the next two weeks in a place where there are no machines, no learning and no data. I'll resume publication on the second Sunday of October.)
New data analysis competitions
- On Kaggle, Cdiscount’s Image Classification Challenge. Up to $35,000 in prizes.
Big Data May Amplify Existing Police Surveillance Practices, Study Shows. Picture me surprised.
With access to more personal data than ever before, police have the power to solve crimes more quickly, but in practice, the influx of information tends to amplify existing practices, according to sociology research at The University of Texas at Austin.
The big data landscape is changing quickly, and researchers wonder whether our political and social systems and regulations can keep up. UT Austin sociologist Sarah Brayne’s research, published in the American Sociological Review, examined for the first time how adopting big data analytics both amplifies and transforms police surveillance practices.
On Sept. 5, Cornell’s Department of Computing and Information Science kicked off the first of a series of talks that aims to discuss the importance of technological advancements and the law in exploring surveillance, privacy and bias. Prof. Arvind Narayanan, computer science, Princeton University, was the first speaker of the series and presented his research with a talk entitled “Uncovering Commercial Surveillance on the Web.”
Commercial surveillance involves techniques used by companies to discreetly and legally trace the internet activity of users. Such surveillance is so widespread that it affects anyone who uses the internet, even for basic browsing.
All the videos from this series will be posted here.
The RCMP used cellphone-tracking technology in a way that was "not lawful" six times, Canada's privacy commissioner said in a report released Thursday.
Mobile device identifiers (MDI) — also referred to as IMSI catchers — work by mimicking a cellphone tower to interact with nearby phones and read the unique ID associated with the phone's International Mobile Subscriber Identity, or IMSI. That number can then be used to track the phone, and sometimes to intercept text messages or calls.
Between 2011 and 2016 the RCMP used IMSI catchers in 125 criminal investigations, 29 of which were in support of other Canadian law enforcement agencies, the report from Daniel Therrien's office found.
This is a very good example of data journalism. A nationwide reporting adventure tracks improbably frequent lottery winners.
Lawrence Mower of the Palm Beach Post in 2014 filed a public records request for 20 years of data on Florida Lottery winners.
After analyzing the data, he found something unusual: A small number of lottery players were winning hundreds of times at almost inconceivably long odds. A statistician compared one frequent winner’s feat to picking one star out of 50 galaxies and “then having your friend guess the same star on the first try.”
Last year, two data scientists from security firm ZeroFOX conducted an experiment to see who was better at getting Twitter users to click on malicious links, humans or an artificial intelligence. The researchers taught an AI to study the behavior of social network users, and then design and implement its own phishing bait. In tests, the artificial hacker was substantially better than its human competitors, composing and distributing more phishing tweets than humans, and with a substantially better conversion rate.
The AI, named SNAP_R, sent simulated spear-phishing tweets to over 800 users at a rate of 6.75 tweets per minute, luring 275 victims. By contrast, Forbes staff writer Thomas Fox-Brewster, who participated in the experiment, was only able to pump out 1.075 tweets a minute, making just 129 attempts and luring in just 49 users.
As Foursquare’s novelty faded, user growth and investment stalled. But before the repo men had to cart off the furniture, co-founder and then-Chief Executive Officer Dennis Crowley realized the company had another card to play. While the tech world had cooled on Foursquare’s breed of apps, it had become obsessed with data about the consumers who used them. What people like on Facebook, search on Google, and write about in Gmail are all valuable leads for marketers. So is a consumer’s physical location—and Foursquare could figure that out with greater precision than anyone else.
Data Links is a periodic blog post published on Sundays (specific time may vary) which contains interesting links about data science, machine learning and related topics. You can subscribe to it using the general blog RSS feed or this one, which only contains these articles, if you are not interested in other things I might publish.
Have you read an article you liked and would you like to suggest it for the next issue? Just contact me!