Data Science Weekly - Issue 72
Issue #72 April 9 2015
Editor Picks
Neural Slime Volleyball
Recurrent net learns to play 'neural slime volleyball' in javascript. Can you beat them?...
Creating a Data-Driven Organization: Two Years On
This is the third post in a series documenting the process of creating a more data-driven organization at Warby Parker...
The Amazing, Autotuning Sandpile
A simple mathematical model of a sandpile shows remarkably complex behavior...
Data Science Articles & Videos
Interview with Greg Linden - Developer of Amazon Recommendation Engine
As you know, we’ve been trying to cover from every angle, the innovations that ecommerce sites in general, and Amazon.com specifically, brought to the world. That is why I was thrilled to get to speak with Greg Linden, who was one of the Amazon engineers who was responsible for a lot of the personalization and data-driven innovations at Amazon, especially the recommendation engine...
Interview: Alessandro Gagliardi, Glassdoor on the Data Scientist Job
We discuss interesting trends, motivation, different aspects of data scientist job, advice, and more...
Markov Chain Monte Carlo Without all the Bullshit
I have a little secret: I don’t like the terminology, notation, and style of writing in statistics. I find it unnecessarily complicated. So to counter, here’s my own explanation of Markov Chain Monte Carlo...
An Experimental Bird Migration Visualization
Every year hundreds of millions of birds migrate to and from their wintering and breeding grounds, often traveling hundreds, if not thousands of kilometers twice a year... one tool, radar, has the ability to measure the mass flow of migrants both day and night at a temporal and spatial resolution that cannot be matched by any other monitoring toolts...
Machine Learning at American Express: Benefits and Requirements
Curious to know how American Express uses machine learning successfully, in production, at very large scale? An audience of over 300 recently got a peek into this big data story thanks to a presentation by Chao Yuan, SVP at American Express who heads their Modeling and Decision Science Team for US Consumer Business...
End-to-End Training of Deep Visuomotor Policies
Policy search methods based on reinforcement learning and optimal control can allow robots to automatically learn a wide range of tasks. However, practical applications of policy search tend to require the policy to be supported by hand-engineered components for perception, state estimation, and low-level control. We propose a method for learning policies that map raw, low-level observations, consisting of joint angles and camera images, directly to the torques at the robot's joints...
Bug Prediction at Google
As Google's code base and teams increase in size, it becomes more unlikely that the submitter and reviewer will even be aware that they're changing a hot spot. In order to help identify these hot spots and warn developers, we looked at bug prediction. Bug prediction uses machine-learning and statistical analysis to try to guess whether a piece of code is potentially buggy or not, usually within some confidence range....
Steven Levitt: Drawing Causal Inference from Big Data
This talk was given as part of the National Academy of Sciences Sackler Colloquium in Washington, D.C. on March 26-27, 2015...
VisuAlgo
Visualising data structures and algorithms through animation...
Jobs
Data Scientist - PlaceIQ - New York, NY PlaceIQ is a rapidly growing, venture funded “Big Data” business with a tremendous opportunity to become the market leader in the exploding location intelligence marketplace. Data Scientists analyze PlaceIQ hyperlocal data sources to develop accurate predictions of audience and behavior. A mixture of analytical approaches are employed including raw data mining, data visualization, application of rules and heuristics and supervised / unsupervised machine learning techniques...
Training & Resources
Baby Boom: An Excel Tutorial on Analyzing Large Data Sets
Ever wanted to use Excel to examine big data sets? This tutorial will show you how to analyze over 300,000 items at one time. And what better topic than baby names?...
Pandas / dplyr comparison for pandas 0.16
This notebook compares pandas and dplyr. The comparison is just on syntax (verbage), not performance. Whether you're an R user looking to switch to pandas (or the other way around), I hope this guide will help ease the transition...
DeepCL
OpenCL library to train deep convolutional neural networks...
Books
Learn R in a Day Clear and efficient way to get up and running in R......
"I was delighted with this little book...it got me functional with R, able to enter, manipulate, and plot data usefully in less than 8 hours of work..."
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Enjoyed the newsletter? Please forward it along to friends and colleagues - we'd love to have them onboard! - All the best, Hannah & Sebastian