• Are you a user of the amazing R caret package? Do you sometimes wonder why, when you use svmRadial or svmLinear (or an SVM with any other kernel), the result changes a bit (or a lot) depending on the logical control variable classProbs? Here is your answer, written by no other than the author of the package.

  • Do you want to Kaggle Kaggle? Sure! So here's a lot of Kaggle competition metadata to play around with

  • When Big Data becomes Bad Data. From the article: It’s troubling enough when Flickr’s auto-tagging of online photos label pictures of black men as “animal” or “ape,” or when researchers determine that Google search results for black-sounding names are more likely to be accompanied by ads about criminal activity than search results for white-sounding names. But what about when big data is used to determine a person’s credit score, ability to get hired, or even the length of a prison sentence?

  • Python is, increasingly, the language of choice for data journalists. Here's one very nice example of what can be achieved. From the article: Just yesterday, CBC News here in Canada did a story about a wealthy family allegedly using tax havens as a means to avoid paying taxes. As part of the news story the reporters at CBC posted some court documents and correspondence. Over my morning coffee I began reading through the court documents and wondered to myself what could be done to look at the timeline of events that some of the companies involved went through. Did they change corporate structure after being chased by the tax man? What could I do to visualize this information?