• Cross Validation gone wrong. This is less about cross-validation than about leaking information by using the whole sample when you should not do it, but interesting nonetheless.

  • A Visual Introduction to Machine Learning.

  • Mocking AI Panic. From the article: [...] history is repeating itself. In the mid-1940s, public reaction to reports of the new “electronic brains” was fearful. Newspapers announced that “the controlled monster” (a room-size vacuum-tube computer) could rapidly become “the monster in control,” reducing people to “degenerate serfs.” Humans would “perish, victims of their own brain products.”

  • My 1st Kaggle ConvNet: Getting to 3rd Percentile in 3 months.

  • Forecasting Influenza Epidemics in Hong Kong (academic article.)

  • The FBI Built a Database That Can Catch Rapists — Almost Nobody Uses It. From the article: More than 30 years ago, the Federal Bureau of Investigation launched a revolutionary computer system in a bomb shelter two floors beneath the cafeteria of its national academy. Dubbed the Violent Criminal Apprehension Program, or ViCAP, it was a database designed to help catch the nation’s most violent offenders by linking together unsolved crimes. A serial rapist wielding a favorite knife in one attack might be identified when he used the same knife elsewhere. The system was rooted in the belief that some criminals’ methods were unique enough to serve as a kind of behavioral DNA — allowing identification based on how a person acted, rather than their genetic make-up.

  • Back Doors Won't Solve Comey's Going Dark Problem. From the article: Imagine that iMessage and Facebook and Skype and everything else US-made had his back door. The ISIL operative would tell his potential recruit to use something else, something secure and non-US-made. Maybe an encryption program from Finland, or Switzerland, or Brazil. Maybe Mujahedeen Secrets. Maybe anything. (Sure, some of these will have flaws, and they'll be identifiable by their metadata, but the FBI already has the metadata, and the better software will rise to the top.) As long as there is something that the ISIL operative can move them to, some software that the American can download and install on their phone or computer, or hardware that they can buy from abroad, the FBI still won't be able to eavesdrop. And by pushing these ISIL operatives to non-US platforms, they lose access to the metadata they otherwise have.

  • This one comes via reddit: A new study from the University of Pennsylvania has found that 91% of health-related webpages relay sensitive information to third parties including Google, Facebook and even data brokers such as Experian. From the article: 91% of health-related pages relay the URL to third parties, often unbeknownst to the user, and in 70% of the cases, the URL contains sensitive information such as "HIV" or "cancer" which is sufficient to tip off these third parties that you have been searching for information related to a specific disease.