Data Science Weekly - Issue 110
Issue #110 December 31 2015
Editor Picks
The Missing 11th of the Month
An xkcd once wondered why the 11th of most months is mentioned less than the other days. I think I figured it out...
Recurrent Net Dreams Up Fake Chinese Characters with TensorFlow
This is the third post in a series of blog posts logging my experiments with with TensorFlow. We will be modifying and extending Graves’ approach to get LSTM + MDN to generate fake Chinese characters in vector format...
Are ML and Statistics Complementary?
Why Machine Learning needs Statistics by Prof. M Welling...
A Message from this week's Sponsor:
Create D3 Data Visualizations As Fast As You Can Sketch
You need to create a D3.js data visualization to communicate your insights. But... #d3BrokeAndMadeArt! This time, your data join appears to have broken and the JavaScript console shows an error you don't recognize. Last time, you got stuck trying to figure out how to make axes that didn't look like 3rd graded made them. It makes you want to strangle D3 with your bare hands. Just how steep does the D3 learning curve need to be?!
What if you could learn and master D3 quickly and deeply? Great news! - You can ... Check out the DashingD3js.com Introductory D3.js Training today.
Data Science Articles & Videos
Our Best Health & Fitness Data Stories of 2015
We love data here at Jawbone. We love to see stats on how people are sleeping, how active they are, what daily routines improve lives, and which are a detriment to our health. This year we shared some health and fitness data stories that people loved—but some of you may have missed. So, here’s a list of my favorite data stories from 2015. I hope it helps you make more informed, healthy decisions next year, and every year!...
Has there been a ‘pause’ in global warming?
A recent focus of this debate has been whether temperature records show a `pause’ (or `hiatus’) in global warming over the last 10 to 20 years (or at least a `slowdown’ compared to the previous trend), and if so, what it might mean...
Monopoly Simulations
In this document I'll have a go at some simulations of monopoly... we will find out that the 'going to jail' mechanic of the game causes some drastic inbalances which you may want to be mindful of...
N.F.L. Playoff Picture: Every Team’s Remaining Paths to the Postseason
With 30 games remaining in the season, there are about 1 billion ways the season could end. Which of those outcomes are best for your team? It’s questions like these that led us to build our N.F.L. simulator, to let you explore which games matter the most for your team’s playoff chances...
Algorithms of the Mind:
What Machine Learning Teaches Us About Ourselves
“Science often follows technology, because inventions give us new ways to think about the world and new phenomena in need of explanation.”...
A non-comprehensive list of awesome things other people did in 2015
I wrote this post because a blog often feels like a place to complain, but we started Simply Stats as a place to be pumped up about the stuff people were doing with data. This year's list is particularly "off the cuff" so I'd appreciate additions if you have 'em...
Replacing Sawzall — a case study in domain-specific language migration
In this post, we’ll describe Sawzall’s role in Google’s analysis ecosystem, explain some of the problems we encountered as Sawzall use increased which motivated our migration, and detail the techniques we applied to achieve language-agnostic analysis while maintaining strong access controls and the ability to write fast, scalable analyses...
Interactive Analytics on Dynamic Big Data in Python using Kudu, Impala, Ibis
I spent this last week expanding the Kudu Python client (a Cython wrapper for the C++ client API) and adding initial integration with Ibis. While my Kudu patch is still in code review, I will give you a preview here of how it all works...
Statistics for Hackers
The field of statistics has a reputation for being difficult to crack: it revolves around a seemingly endless jargon of distributions, test statistics, confidence intervals, p-values, and more, with each concept subject to its own subtle assumptions. But it doesn't have to be this way...
How to describe your Personal Projects on your Data Science resume
You’ve been told the Projects section is important to include on on your resume, but the advice stopped there...
Jobs
Principal Data Scientist - LendUp - San Francisco, CA Millions of people don’t have access to the credit they deserve. Their options for borrowing money are limited and unfair, and it’s easy to become trapped in debt. This is the first problem LendUp set out to solve, by applying our combined expertise in software engineering, data science, credit, education, and our shared passion for justice...
Training & Resources
Evaluation of Deep Learning Toolkits
In this study, I evaluate some popular deep learning toolkits. The candidates are listed in alphabetical order: Caffe, CNTK, TensorFlow, Theano, and Torch...
How To Make Python Run As Fast As Julia
Should we ditch Python and other languages in favor of Julia for technical computing? That's certainly a thought that comes to mind when one looks at the benchmarks...
Demystifying Deep Reinforcement Learning
In this blog post I will be trying to demystify this technique and understand the rationale behind it...
Books
Prime Obsession:
Bernhard Riemann and the Greatest Unsolved Problem in Mathematics Fascinating account of a mathematical mystery that continues to challenge...
"Prime Obsession is a delight: a book about a hypothesis on the distribution of prime numbers that reads like a gripping mystery..."
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian