• The Second Annual Data Science Bowl is here. From the competition: The 2015 Data Science Bowl challenges you to create an algorithm to automatically measure end-systolic and end-diastolic volumes in cardiac MRIs. You will examine MRI images from more than 1,000 patients. This data set was compiled by the National Institutes of Health and Children's National Medical Center and is an order of magnitude larger than any cardiac MRI data set released previously. With it comes the opportunity for the data science community to take action to transform how we diagnose heart disease. Prizes for this competition add up to $200,000!

  • Cleaning CSV data using the command line and csvkit, Part1. Because the command line is often the fastest and safest way to get any task done.

  • Common Probability Distributions: The Data Scientist's Crib Sheet. From the article: There are hundreds of probability distributions, some sounding like monsters from medieval legend like the Muth or Lomax. Only about 15 distributions turn up consistently in practice though. What are they, and what clever insights about each of them should you memorize?

  • Donald Trump is about to make us forget about the use of data in politics. Let's try to correct that. Ted Cruz using firm that harvested data on millions of unwitting Facebook users. From the article: Ted Cruz’s presidential campaign is using psychological data based on research spanning tens of millions of Facebook users, harvested largely without their permission, to boost his surging White House run and gain an edge over Donald Trump and other Republican rivals, the Guardian can reveal.