How-to
-
Finding the world's saddest and happiest songs, by yours truly.
Privacy
For example, on November 7, John went to Yogurtland to get frozen yogurt with his girlfriend; he made a post on Twitter, added a ‘froyo’ emoji, and tagged his location. On November 9, he went to Foot Locker and bought a pair of Chuck Taylor All-Stars; he was so excited about his new shoes that he wore them out of the store, and posted a geo-tagged photo to Instagram. With just these two data points — fixing John to two moments in space and time, the yogurt shop (11/7) and the shoe store (11/9) — the analysts are able to ‘narrow down’ the dataset: they find that there is one and only one person in the entire set who went to these two places on these two days. Once John is ‘de-anonymized,’ the analysts have in their hands a detailed profile of his entire spending history.
Tech
The Deep Learning textbook is a resource intended to help students and practitioners enter the field of machine learning in general and deep learning in particular. The online version of the book is now complete and will remain available online for free. The print version will be available for sale soon.
scikit-feature is an open-source feature selection repository in Python developed by Data Mining and Machine Learning Lab at Arizona State University. It is built upon one widely used machine learning package scikit-learn and two scientific computing packages Numpy and Scipy. scikit-feature contains around 40 popular feature selection algorithms, including traditional feature selection algorithms and some structural and streaming feature selection algorithms.
- Data Science Career Questions Thread on reddit, with very good discussions on the matter.
Visualizations
-
Why Batman v Superman is now one of the worst-rated blockbusters in history, charted. But, by all means, do not miss the reddit discussion about this article.
-
Film Dialogue from 2,000 screenplays, Broken Down by Gender and Age.
-
Plotting Fare vs Distance of ~1,000,000 NYC Yellow Taxi Trips to Determine the Effective Rate. And its corresponding /r/dataisbeautiful discussion.