• Applying word vectors to the Voynich manuscript (via HN). And another ML application to text analysis: Organizing my emails with a neural net.

  • Formatting table output in R.

  • Reddit also has its 1 %.

  • Pipelines for Data Analysis, a wonderful 1-hour tutorial about R's %>% operator by no other than Hadley Wickham himself.

  • Introducing Kaggle Datasets, a repository of public datasets for centralized access. As these datasets are available through Kaggle scripts, you can even analyze them without downloading first to your computer.

  • Internet of Things security is so bad, there’s a search engine for sleeping kids. From the article: Shodan, a search engine for the Internet of Things (IoT), recently launched a new section that lets users easily browse vulnerable webcams. The feed includes images of marijuana plantations, back rooms of banks, children, kitchens, living rooms, garages, front gardens, back gardens, ski slopes, swimming pools, colleges and schools, laboratories, and cash register cameras in retail stores. This is very related to the @FFD8FFDB Twitter account (more information here.

  • Many Americans say they might provide personal information, depending on the deal being offered and how much risk they face. From the article: Most Americans see privacy issues in commercial settings as contingent and context-dependent. A new Pew Research Center study based on a survey of 461 U.S. adults and nine online focus groups of 80 people finds that there are a variety of circumstances under which many Americans would share personal information or permit surveillance in return for getting something of perceived value. For instance, a majority of Americans think it would be acceptable (by a 54% to 24% margin) for employers to install monitoring cameras following a series of workplace thefts. Nearly half (47%) say the basic bargain offered by retail loyalty cards – namely, that stores track their purchases in exchange for occasional discounts – is acceptable to them, even as a third (32%) call it unacceptable.

  • The epic fail of Hollywood's hottest algorithm. From the article: Because the performance of any one movie is unpredictable, studios have always managed risk by betting on an entire slate of films. But Kavanaugh presented banks with a couple of big ideas: Borrowing a tool from Wall Street, he touted his “Monte Carlo model,” a computer program that runs thousands of simulations, as a device that could predict a film’s success far more reliably than even a sophisticated studio executive. Better, Kavanaugh convinced several studios that he could raise more money for them if they gave him access to their guarded “ultimates” numbers showing the historical or projected performance of a film across all platforms (DVD, video-on-demand, etc.) over a number of years.