Data Science Weekly - Issue 128
Issue #128 May 5 2016
Editor Picks
The Special Relationship Between Noodles and Qdoba
I’ve had a theory that for every Noodles, there’s a Qdoba that’s right next door. It might be some sort of selection bias however, since I can think of a couple locations where they’re directly next to each other. To me, Noodles and Qdoba have a special relationship, at least compared to other restaurants. I figured now was about the time I should test this, and I can use Chipotle to test...
A Data Scientist Dissects the 2016 NFL Draft
Jared Lander, who helped the Minnesota Vikings ace the draft a year ago, breaks down the best prospects of this year’s class...
A Neural Network that Dreams in Resumes
If a neural network can write Shakespeare, could it write a resume for you? Inspired by the remarkable results of Recurrent Neural Networks and using thousands of anonymized resumes from untapt, I’ve been experimenting with applying deep learning techniques to the CV...
A Message from this week's Sponsor:
Whitepaper: A Practical Guide to Building Data Driven Products Beyond Analysts' Laptops via @YhatHQ
Learn how to apply data science insights to the real world. Discover the implications beyond analysts’ laptops and answer the question of what to do with predictive models once they’re built.
Data Science Articles & Videos
How to get into the top 15 of a Kaggle competition using Python
Doing well in a Kaggle competition requires more than just knowing machine learning algorithms. It requires the right mindset, the willingness to learn, and a lot of data exploration. Many of these aspects aren’t typically emphasized in tutorials on getting started with Kaggle, though. In this post, I’ll cover how to get started with the Kaggle Expedia hotel recommendations competition, including establishing the right mindset, setting up testing infrastructure, exploring the data, creating features, and making predictions...
Extreme Style Machines: Using Random Neural Networks to Generate Textures
Wait, what! Generating high-quality images based on completely random neural networks? That’s the unreasonable effectiveness of deep representations...
Finding Similar Music using Matrix Factorization
This post is a step by step guide on how to calculate related artists using a couple of different matrix factorization algorithms. The code is written in Python using Pandas and SciPy to do the calculations and D3.js to interactively visualize the results...
[Video] How Machine Learning Amplifies Inequality in Society
In this talk, Mike Williams, Research Engineer at Fast Forward Labs, looks at how supervised machine learning has the potential to amplify power and privilege in society. Using sentiment analysis, he demonstrates how text analytics often favors the voices of men. Mike discusses how bias can inadvertently be introduced into any model, and how to recognize and mitigate these harms...
Neural Networks Are Impressively Good At Compression
I hope I have given you an intuition for how neural networks can compress patterns in few weights. They use the full range of the weights to the point where a connection activated with a strong input can mean something entirely different than the same connection activated with a weak input. And best of all I didn’t have to teach them to do this. They just start behaving like this if you force them to express a complex pattern in few connections...
Artistic style transfer for videos
We present an approach that transfers the style from one image (for example, a painting) to a whole video sequence. Supplementary video accompanying the paper...
Terrain-Adaptive Locomotion Skills Using Deep Reinforcement Learning
Reinforcement learning offers a promising methodology for developing skills for simulated characters, but typically requires working with sparse hand-crafted features. Building on recent progress in deep reinforcement learning (DeepRL), we introduce a mixture of actor-critic experts (MACE) approach that learns terrain-adaptive dynamic locomotion skills using high-dimensional state and terrain descriptions as input, and parameterized leaps or steps as output actions...
The Descriptor Protocol, and Python Black Magic
Since I graduated last summer, I have been writing lots of both Python 2 and 3. This snippet seemed like something I should understand well. However, I did not, so this post is an attempt to solve that...
Jobs
Senior Data Scientist - SimpleReach - New York SimpleReach is seeking a seasoned data scientist to join our ranks. This mathematically savvy individual will be on the front lines, wrangling data and investigating our massive stores of traffic events while also building machines to intelligently classify content and build recommendation engines for a wide range of applications...
Training & Resources
An Introduction to Scientific Python (and a Bit of the Maths Behind It) - Matplotlib
In this series of posts, we will take a look at the main libraries used in scientific Python and learn how to use them to bend data to our will. We won't just be learning to churn out template code however, we will also learn a bit of the maths behind it so that we can understand what is going on a little better. So let's kick things off with a incredibly useful little number that we will be using throughout this series of posts; Matplotlib...
D3 Basic Pie Chart Video Tutorial
You will use the CSV data from the D3js.org website Pie Chart example to see how a full D3 Pie Chart example data visualization is built...
Identify, describe, plot, and remove the outliers from the dataset
There are different methods to detect the outliers, including standard deviation approach and Tukey’s method which use interquartile (IQR) range approach. In this post I will use the Tukey’s method because I like that it is not dependent on distribution of data...
Books
Street-Fighting Mathematics:
The Art of Educated Guessing and Opportunistic Problem Solving Not a new book, though very well reviewed...
"This book is a treasure trove of intuitive, practical, and brilliant mathematical techniques. Every person with an interest in mathematics, science, or engineering will enjoy this highly stimulating and fun book.""
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian