• If you are a ggplot user, this quick reference will come in handy more than once.

  • Before rushing to try the latest fancy machine learning algorithm out there, you should invest some time doing feature engineering. Properly done, it will increase your model accuracy more than just fine-tuning a sophisticated algorithm with a lot of garbage as input. Here is another reference with several useful techniques for this. In my past experience, the earth package has yielded very good results with a variety of different inputs. As each problem is different, be advised: YMMV.

  • How Data Science Fueled the Largest Outreach Effort in the History of New York City. From the article: To counteract this dynamic, the Bill De Blasio administration in New York City made universal, free of cost, access to pre-kindergarten a key initiative for their first year in office. Accomplishing this required a number of different work streams. First, classroom space and programs had to be requisitioned (a massive project in and of itself). Next, parents and guardians across the city had to be made aware of the options in their neighborhood, which required the largest outreach effort in the history of New York City. The goal was to find every guardian of a 4-year-old citywide and have a volunteer call him or her with information detailing all the how/when/where’s of their local free pre-kindergarten option.

  • A bit of personal interpretation on data by one of my favorite sci-fi authors, Charles Stross: Long range forecast. From the article: So, I'm seeing a bunch of disturbing news headlines in the new year. [...] What is the news (as opposed to popular entertainment and celebrity gossip) going to be like for the next decade? Let me give you a forecast.