[...] for nearly a decade, Google did in fact keep DoubleClick's massive database of web-browsing records separate by default from the names and other personally identifiable information Google has collected from Gmail and its other login accounts.

But this summer, Google quietly erased that last privacy line in the sand — literally crossing out the lines in its privacy policy that promised to keep the two pots of data separate by default. In its place, Google substituted new language that says browsing habits "may be" combined with what the company learns from the use Gmail and other tools.

A Federal Court judge says Canada's spy agency illegally kept potentially revealing electronic data about people over a 10-year period.

In a hard-hitting ruling made public Thursday, Justice Simon Noel said the Canadian Security Intelligence Service breached its duty to inform the court of its data-collection program, since the information was gathered using judicial warrants.

CSIS should not have retained the information since it was not directly related to threats to the security of Canada, the ruling said.


Data science continues to generate excitement and yet real-world results can often disappoint business stakeholders. How can we mitigate risk and ensure results match expectations? Working as a technical data scientist at the interface between R&D and commercial operations has given me an insight into the traps that lie in our path. I present a personal view on the most common failure modes of data science projects.

  • LipNet, machine-learning-enhanced lip reading.

Lipreading is the task of decoding text from the movement of a speaker's mouth. Traditional approaches separated the problem into two stages: designing or learning visual features, and prediction. More recent deep lipreading approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman, 2016a). All existing works, however, perform only word classification, not sentence-level sequence prediction. Studies have shown that human lipreading performance increases for longer words (Easton & Basala, 1982), indicating the importance of features capturing temporal context in an ambiguous communication channel. Motivated by this observation, we present LipNet, a model that maps a variable-length sequence of video frames to text, making use of spatiotemporal convolutions, an LSTM recurrent network, and the connectionist temporal classification loss, trained entirely end-to-end. To the best of our knowledge, LipNet is the first lipreading model to operate at sentence-level, using a single end-to-end speaker-independent deep model to simultaneously learn spatiotemporal visual features and a sequence model. On the GRID corpus, LipNet achieves 93.4% accuracy, outperforming experienced human lipreaders and the previous 79.6% state-of-the-art accuracy.

Hacker News had some interesting discussion on this.


  • This week the main visualizations that are getting F5'ed constantly are poll aggregations for the US presidential elections, like the ones at Five Thirty-Eight.