Data Links #152

New data analysis competitions

On Kaggle, TrackML Particle Tracking Challenge. Up to $25,000 in prizes. This one made the news.

How-to

The Gambler Who Cracked the Horse-Racing Code (via.)

Privacy

How to Wrestle Your Data From Data Brokers, Silicon Valley — and Cambridge Analytica. This is similar to another item posted here (see Data Links #147), and it made me think: the current approach to data protection is quite reactive. Even GDPR. Perhaps the next approach would be something like a general "do-not-profile" list.

My personal data, and likely yours, is in more hands than ever. Tech firms, data brokers and political consultants build profiles of what they know — or think they can reasonably guess — about your purchasing habits, personality, hobbies and even what political issues you care about.

You can find out what those companies know about you but be prepared to be stubborn. Very stubborn. To demonstrate how this works, we’ve chosen a couple of representative companies from three major categories: data brokers, big tech firms and political data consultants.
‘Forget the Facebook leak’: China is mining data directly from workers’ brains on an industrial scale.

On the surface, the production lines at Hangzhou Zhongheng Electric look like any other.

Workers outfitted in uniforms staff lines producing sophisticated equipment for telecommunication and other industrial sectors.

But there’s one big difference – the workers wear caps to monitor their brainwaves, data that management then uses to adjust the pace of production and redesign workflows, according to the company.
Don't Give Your DNA to Giant Genetic Databases.

Two years ago, I moderated a panel at SXSW called “Is Your Biological Data Safe?” Looking at the panelists—a woman who runs a DIY bio lab, 23andMe’s privacy officer, and an FBI agent—it was not hard to determine at the time that the answer was, and is, “no.”

Another item on the same topic. Cops Aren't Just Submitting DNA Samples To Genealogy Services; They're Also Obtaining Customer Info.
A slightly unrelated article. Sometimes no one needs to go around asking for subpoenas to extract information, everything law enforcement may need is already out in the open. This article about a very specific lottery fraud case includes this paragraph:

The investigators collected a decade’s worth of winners from lotteries around the country associated with the Multi-State Lottery Association. They loaded data from approximately 45,000 winning tickets into Microsoft Excel spreadsheets and searched for any connections to Eddie Tipton. They reviewed Tipton’s Facebook friends, pulled phone records and looked for matches with the spreadsheet.

For a lot of people, their social network is already out in the open. Provided by themselves.
N.S.A. Triples Collection of Data From U.S. Phone Companies.

The National Security Agency vacuumed up more than 534 million records of phone calls and text messages from American telecommunications providers like AT&T and Verizon last year — more than three times what it collected in 2016, a new report revealed on Friday.

Tech

The Era of Fake Video Begins (via IP list.)

In a dank corner of the internet, it is possible to find actresses from Game of Thrones or Harry Potter engaged in all manner of sex acts. Or at least to the world the carnal figures look like those actresses, and the faces in the videos are indeed their own. Everything south of the neck, however, belongs to different women. An artificial intelligence has almost seamlessly stitched the familiar visages into pornographic scenes, one face swapped for another. The genre is one of the cruelest, most invasive forms of identity theft invented in the internet era. At the core of the cruelty is the acuity of the technology: A casual observer can’t easily detect the hoax.

This development, which has been the subject of much hand-wringing in the tech press, is the work of a programmer who goes by the nom de hack “deepfakes.” And it is merely a beta version of a much more ambitious project. One of deepfakes’s compatriots told Vice’s Motherboard site in January that he intends to democratize this work. He wants to refine the process, further automating it, which would allow anyone to transpose the disembodied head of a crush or an ex or a co-worker into an extant pornographic clip with just a few simple steps. No technical knowledge would be required. And because academic and commercial labs are developing even more-sophisticated tools for non-pornographic purposes—algorithms that map facial expressions and mimic voices with precision—the sordid fakes will soon acquire even greater verisimilitude.
Five myths about artificial intelligence.

From health care to transportation to national security, AI has the potential to improve lives. But it comes with fears about economic disruption and a brewing “AI arms race .” Like any transformational change, it’s complicated. Perhaps the biggest AI myth is that we can be confident about its future effects. Here are five others.
Revealed: how bookies use AI to keep gamblers hooked. Hacker News discussion.

The gambling industry is increasingly using artificial intelligence to predict consumer habits and personalise promotions to keep gamblers hooked, industry insiders have revealed.

Current and former gambling industry employees have described how people’s betting habits are scrutinised and modelled to manipulate their future behaviour.

“The industry is using AI to profile customers and predict their behaviour in frightening new ways,” said Asif, a digital marketer who previously worked for a gambling company. “Every click is scrutinised in order to optimise profit, not to enhance a user’s experience.”

Note that the last quoted phrase applies to basically any service that converts clicks / engagement / attention into money.
Glitch Capitalism: How Cheating AIs Explain Our Glitchy Society.

That’s what tests are for, and engineers learn from their mistakes and oversights. Liberal capitalist democracy, however, isn’t great with do-overs. In the political realm, there’s a fear that any flexible or dynamic process would be subject to tyrannical abuse, and it’s better to just wait until the next election. When it comes to property, possession is nine-tenths of the law; good luck trying to get your money back due to unfairness. And then there’s our system’s ultimate exploit: regulatory capture. That’s like if the twitchy robot used its ill-gotten energy to take over the computer and make sure the error never got patched. What looked like a glitch becomes the system’s defining characteristic, which might help explain why we all walk around now by slamming our face against the floor.
Scientists teach neural network to identify a writer's gender.

A team of researchers from the National Research Nuclear University MEPhI, the National Research Center Kurchatov Institute and the Voronezh State University has developed a new learning algorithm that allows a neural network to identify a writer's gender by the written text on a computer with up to 80 percent accuracy.

Data Links is a periodic blog post published on Sundays (specific time may vary) which contains interesting links about data science, machine learning and related topics. You can subscribe to it using the general blog RSS feed or this one, which only contains these articles, if you are not interested in other things I might publish.

Have you read an article you liked and would you like to suggest it for the next issue? Just contact me!

There is no comment system. If you want to tell me something about this article, you can do so via e-mail or Mastodon.