Data Science Weekly - Issue 30
Issue #30 June 19 2014
Editor Picks
Architecting a Machine Learning System for Risk (Airbnb) Different risk vectors can require different architectures. For example, some risk vectors are not time critical, but require computationally intensive techniques to detect. An offline architecture is best suited for this kind of detection. For the purposes of this post, we are focusing on risks requiring realtime or near-realtime action. From a broad perspective, a machine-learning pipeline for these kinds of risk must balance two important goals...
Conjecture: Scalable Machine Learning in Hadoop with Scalding (Etsy) Predictive machine learning models are an important tool for many aspects of e-commerce. At Etsy, we use machine learning as a component in a diverse set of critical tasks...
Optimizing the Netflix Streaming Experience with Data Science Netflix is committed to delivering outstanding streaming service and is investing heavily in advancing the state of the art in adaptive streaming algorithms and network technologies such as Open Connect to optimize streaming quality. To put even more focus on "streaming science," we've created a new team at Netflix that's working on innovative approaches for using our data to improve QoE. In this post, I will briefly outline the types of problems we're solving...
Data Science Articles & Videos
Chat with Andrew Ng, Co-Founder, Coursera; Director, Stanford AI Lab
In-depth interview with Andrew Ng...
World Cup Learning
An IPython notebook using pybrain to learn/predict World cup outcomes...
Predictive Analytics by Oliver Griesel
Presentation given at USI 2014, Paris. Contrasts Big Data vs Analytics and explore how the two interact...
Why I switched to Julia
The following story, which I originally posted to The COBE Blog, explains why I began programming in Julia. Since then, I have found that Julia improves the performance of my other econometric estimators...
Experience optimization at zulily
Experimentation is the name of the game for most top tech companies, and it’s no different here at zulily...
First, Second Derivative, Convolution and Quadratic Fitting via MCMC
In this post: First, how we approach fitting a curve to a perfect quadratic function, using first order and second order derivatives of the function. Second, how one can do curve fitting in a quadratic function via Monte Carlo Markov Chain(MCMC) via Pymc. Last, how convolution could be used for numerical differentiation to estimate the coefficients of quadratic function...
Frequentism and Bayesianism IV: How to be a Bayesian in Python
Here I want to back away from the philosophical debate and go back to more practical issues: demonstrating how you can apply Bayesian ideas in Python...
Be the first to try Microsoft's new Machine Learning service
With machine learning, computers can approach human performance in perception and understanding across vast amounts of data. Expensive and disconnected tools stood in the way of this innovation, but today Microsoft is democratizing machine learning...
How a Russian mathematician constructed a decision tree - by hand -
to solve a medical problem
Here’s an excerpt from Love and Math, a book by Edward Frenkel. One of the stories is about how during his studies in the 80s he built a decision tree to help with kidney transplants. There was no machine to learn from data so humans had to do the work...
Jobs
Director of Analytics - Coursera - Mountain View, CA Coursera is focused on creating universal access to the world’s best University education. In less than two years we’ve brought over 500 courses to more than 6 million students worldwide, and we’re just getting started. Data Science is at the core of how we are achieving this mission - not surprising given that our founders are both pre-eminent Stanford professors in Machine Learning. Coursera is looking for a leader for our growing analytics organization...
Training & Resources
Deep Neural Networks: A Getting Started Tutorial
Deep Neural Networks are the more computationally powerful cousins to regular neural networks. Learn exactly what DNNs are and why they are the hottest topic in machine learning research...
Machine Learning is fun! The world’s easiest introduction to Machine Learning...
Practical Deep Learning Lecture: Machine Perception and Its Applications (Deeplearning4j)
Adam Gibson (Data Scientist and Co-Founder, Blix.io) presents his open-source, distributed deep-learning framework, Deeplearning4j. He demos sentiment analysis and facial recognition tools...
Books
The Elements of Statistical Learning: Data Mining, Inference, and Prediction Not new, though one of the most comprehensive books in the space...
"The good news is, this is pretty much the most important book you are going to read in the space. It will tie everything together for you in a way that I haven't seen any other book attempt. "
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Did you enjoy the newsletter? Do you have friends/colleagues who might like it too? If so, please forward it along - we would love to have them onboard :)