New data analysis competitions
- Kaggle's TalkingData Mobile User Demographics. 25.000$ in prizes. One of those competitions in which the data comes in databases and you have to build your own features. I find these considerably more fun than the ones where the dataset has already been generated.
[PDF] From Linear Models to Machine Learning, free ebook.
As a modestly sized department — policing 2 million citizens with just over 1,800 sworn officers — the San Bernardino Sheriff’s Department doesn't seem like it would be on the cutting edge of surveillance technology. But the department has quietly become one of the most productive nodes in a nationwide iris-scanning project, collecting iris data from at least 200,000 arrestees over the last two and a half years, according to documents obtained by The Verge. In the early months of 2016, the department was collecting an average of 189 iris scans each day.
R is used in every step of the data journalism process: for cleaning and processing data, for exploratory graphing and statistical analysis, for models deploying in real time as and to create publishable data visualizations. We write R code to underpin several of our popular interactives, as well, like the Facebook Primary and our historical Elo ratings of NBA and NFL teams. Heck, we’ve even styled a custom ggplot2 theme. We even use R code on long-term investigative projects.