Data Science Weekly - Issue 74
Issue #74 April 23 2015
Editor Picks
Finally, Neural Networks That Actually Work
As one of Google’s earliest engineers, Dean helped create the fundamental computing systems that underpin the company’s vast online empire, systems that span tens of thousands of machines. This work gave him celebrity-like status among Silicon Valley engineers—people recognize him as he walks through the Google cafeteria. Now, armed with those massively distributed systems and the ideas that drive them, he has returned to the world of neural networks. And this time, these artificial brains work remarkably well...
Competing in a data science contest without reading the data
In this post, I will describe a method to climb the public leaderboard without even looking at the data. The algorithm is so simple and natural that an unwitting analyst might just run it...
We Need Algorithmic Angels
A lot has been written on how algorithms are manipulating this and that in today’s Internet. However, there haven’t been many concrete proposals about how to create more human-centered algorithmic solutions. For example, do we need algorithms that are on our side?...
A Message from this week's Sponsor
Want to be a Data Scientist, but don't know where to start?
Learn essential Data Science skills in SlideRule's Intro to Data Science Workshop. In this online bootcamp, you'll learn R, data wrangling, analytics and visualization by working on real projects, with 1-on-1 mentorship from expert Data Scientists from LinkedIn, Glassdoor, Trulia and Stripe.
Spots are limited; registration ends in 48 hours!
Data Science Articles & Videos
Hacking an epic NHL goal celebration with a hue light show and real-time machine learning
Below is a Youtube clip of the epic goal celebration hack in action. In a single sentence, I trained a machine learning model to detect in real-time that a goal was just scored by the Habs based on the live audio feed of a game and to trigger a light show using Philips hues in my living room...
Why GEMM is at the heart of deep learning
I spend most of my time worrying about how to make deep learning with neural networks faster and more power efficient. In practice that means focusing on a function called GEMM...
The Non-parametric Bootstrap as a Bayesian Model
In this post I will show how the classical non-parametric bootstrap of Efron (1979) can be viewed as a Bayesian model. I will start by introducing the so-called Bayesian bootstrap and then I will show three ways the classical bootstrap can be considered a special case of the Bayesian bootstrap...
Beyond Short Snippets: Deep Networks for Video Classification
We evaluated two approaches - feature pooling networks and recurrent neural networks (RNNs) - capable of modeling variable length videos with a fixed number of parameters while maintaining a low computational footprint. In doing so, we were able to not only show that learning a high level global description of the video’s temporal evolution is very important for accurate video classification, but that our best networks exhibited significant performance improvements over previously published results on the Sports 1 million dataset (Sports-1M)...
IBM's Watson Designed The Worst Burrito I've Ever Had
It’s the worst burrito I’ve ever had. I don’t know another way to say it. I’m staring at my plate in disbelief. Could burritos be bad? Yes, yes I’d just learned. But that’s not the biggest shocker. The biggest shocker is that this recipe was largely designed by Watson, IBM’s best artificial intelligence—one that had already fed me one of the most uniquely delicious BBQ sauces I’d ever eaten....
Understanding Bayes: A Look at the Likelihood
One thing that often gets left out of the discussion is the importance of the likelihood. The likelihood is the workhorse of Bayesian inference. In order to understand Bayesian parameter estimation you need to understand the likelihood. In order to understand Bayesian model comparison (Bayes factors) you need to understand the likelihood and likelihood ratios...
Compressing Neural Networks with the Hashing Trick
As deep nets are increasingly used in applications suited for mobile devices, a fundamental dilemma becomes apparent: the trend in deep learning is to grow models to absorb ever-increasing data set sizes; however mobile devices are designed with very little memory and cannot store such large models. We present a novel network architecture, HashedNets, that exploits inherent redundancy in neural networks to achieve drastic reductions in model sizes...
Using ICCs to Calculate the Effect of the Quarterback
I want to answer one of Bill Connelly’s 45 Reasons to Care about College Football Analytics. These are a set of questions Bill created to drive interest in analyzing college football data. Specifically, I want to begin to address question #5, quantifying how important the quarterback is to the offense...
Practical Machine Learning For The Uninitiated
Last fall when I took on ShippingEasy’s machine learning problem, I had no practical experience in the field. Getting such a task put on my plate was somewhat terrifying, and even more so as we started to wade into the waters of machine learning. Ultimately, we overcame those obstacles and delivered a solution that allowed us to automate our customer’s actions with greater than 95% accuracy. Here are some of the challenges that we experienced when applying machine learning to the shipping & fulfilment domain, and how we broke through them...
Jobs
Senior Software Development Engineer, Big Data - Kiva Systems - MA Kiva Systems (a robotics subsidiary of Amazon) is seeking a curious, highly motivated and talented Senior Software Engineer - Big Data with experience in NoSQL and big data technologies such as Hadoop and Spark to join the Scout Development group. The Scout Development group develops data frameworks, dashboards and an analytics frontend to help analysts, fulfillment center (FC) operations and senior management understand and improve the performance of a Kiva FC. The team currently deals with terabytes of data, 300+ performance indices and a million page views with 99.9% uptime and is continually looking for elegant solutions to scale the data infrastructure....
Training & Resources
Jeff Dean: Large Scale Machine Learning for Predictive Tasks, Pt. 1
Video of Jeff Dean's RecSys 2014 Keynote...
Deep Learning for Dummies
The aim of the talk was to give a high level idea of what deep learning is, starting from machine learning and decision trees and how deep learning figures things out automatically...
Modern Methods for Sentiment Analysis
Overview of main methods...
Books
Advanced Analytics with Spark: Patterns for Learning from Data at Scale NEW RELEASE: Four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark...
"If you want to get a better understanding of why Spark is so disruptive in Data Science and why it truly democratizes the whole Big Data space - this book is a great primer for that..."
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Enjoyed the newsletter? Please forward it along to friends and colleagues - we'd love to have them onboard! - All the best, Hannah & Sebastian