Data Science Weekly - Issue 51
Issue #51 Nov 13 2014
Editor Picks
The Hipster Effect: An IPython Interactive Exploration
This week I started seeing references all over the internet to this paper: The Hipster Effect: When Anticonformists All Look The Same. It essentially describes a simple mathematical model which models conformity and non-conformity among a mutually interacting population, and finds some interesting results...
The Learning Behind Gmail Priority Inbox
The Priority Inbox feature of Gmail ranks mail by the probability that the user will perform an action on that mail. Because “importance” is highly personal, we try to predict it by learning a per-user statistical model, updated as frequently as possible. This research note describes the challenges of online learning over millions of models, and the solutions adopted...
Google, Spotify, & Pandora bet a computer could generate a better playlist than you can
Google, Pandora, and Spotify haven’t exactly advertised it, but they are all working on using a type of artificial intelligence called “deep learning” to make a better music playlist for you...
Data Science Articles & Videos
Music Information Retrieval using Scikit-learn
Music information retrieval (MIR) is an interdisciplinary field bridging the domains of statistics, signal processing, machine learning, musicology, biology, and more. In this talk, Steve Tjoa from Humtap surveys common research problems in MIR, including music fingerprinting, transcription, classification, and recommendation, and recently proposed solutions in the research literature...
Text and Image Analysis: From pixels to characters and back
While text and images differ in many ways and can exist independently, they are in fact complementary and non-competing communication mediums, and to get a holistic view of the world, we would need to analyze both. Understanding images is as important as understanding text, as together they provide a more accurate picture of reality...
Data Science with Hadoop - predicting airline delays - part 1
Every year approximately 20% of airline flights are delayed or cancelled, resulting in significant costs to both travellers and airlines. As our example use-case, we will build a supervised learning model that predicts airline delay from historial flight data and weather information...
Python as part of a production machine learning stack [at Stripe]
While the vast majority of transactions facilitated by Stripe are honest, we do need to protect our merchants from rogue individuals and groups seeing to "test" or "cash" stolen credit cards. To combat this sort of activity, Stripe uses Python (together with Scala and Ruby) as part of its production machine learning pipeline to detect and block fraud in real time. In this talk, I'll go through the scikit-based modeling process for a sample data set that is derived from production data to illustrate how we train and validate our models...
Memory Networks
We describe a new class of learning models called memory networks. Memory networks reason with inference components combined with a long-term memory component; they learn how to use these jointly...
Markov Chains vs Simulation: Flipping a Million Little Coins
I saw an interesting question on Reddit the other day. The problem was about estimating the amount of decaying radioactive isotopes in a sample after a set amount of time. I don’t think anyone in the thread brought up Markov chains, but that’s what I immediately thought of...
What are the most common mistakes made by aspirational data scientists?
Thinking that data science is only about math and computer science...
Random feedback weights support learning in deep neural networks
We show that a network can learn to extract useful information from signals sent through random feedback connections. In essence, the network learns to learn. We demonstrate that this new mechanism performs as quickly and accurately as backpropagation on a variety of problems and describe the principles which underlie its function...
Anomalies, Concerts & Data Science at the Command Line:
Jeroen Janssens Interview
We recently caught up with Jeroen Janssens, author of Data Science at the Command Line. We were keen to learn more about his background, his recent work at YPlan and his work creating both the book and the (related) Data Science Toolbox project...
Universities Can't Train Data Scientists Fast Enough for CIOs
CIOs are struggling to find Data Scientists...
Jobs
Senior Data Scientist - L’Oreal: Connected Beauty Incubator - New Jersey As a Senior Data Scientist in the Incubator Organization, this position involves utilizing novel tools for "big data" science. The individual will work in the Hadoop environment to analyze large volumes of data, ultimately using statistical and data mining tools including clustering, classification, and regression models to understand and predict consumer needs. The Senior Data Scientist will present technical insights to management to inform product development...
Training & Resources
D3 Deconstructor
The D3 Deconstructor is a Google Chrome extension for extracting data from D3.js visualizations...
The Design and Implementation of Probabilistic Programming Languages
This book explains how to implement PPLs by lightweight embedding into a host language...
Recurrent Neural Networks with Word Embeddings
In this tutorial, you will learn how to do Word Embeddings using Recurrent Neural Networks architectures with Context Windows - in order to perform Semantic Parsing / Slot-Filling (Spoken Language Understanding)...
Books
Foundations of Machine Learning In this handbook, data expert Q. Ethan McCallum has gathered 19 colleagues from every corner of the data arena to reveal how they’ve recovered from nasty data problems...
"This book gives an unbiased presentation of machine learning with solid theoretical justifications. It discusses the principles behind the design of learning algorithms by introducing and using the most modern tools and concepts in learning theory. This helps answering many fundamental questions..."
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Enjoyed the newsletter? Please forward it to friends and peers - we'd love to have them onboard too :-) - All the best, Hannah & Sebastian