Data Science Weekly - Issue 154
Issue #154 Nov 3 2016
Editor Picks
Is Bayesian A/B Testing Immune to Peeking? Not Exactly
Since I joined Stack Exchange as a Data Scientist in June, one of my first projects has been reconsidering the A/B testing system used to evaluate new features and changes to the site. Our current approach relies on computing a p-value to measure our confidence in a new feature. Unfortunately, this leads to a common pitfall in performing A/B testing, which is the habit of looking at a test while it’s running, then stopping the test as soon as the p-value reaches a particular threshold...
Ten Ways Your Data Project is Going to Fail
Data science continues to generate excitement and yet real-world results can often disappoint business stakeholders. How can we mitigate risk and ensure results match expectations? Working as a technical data scientist at the interface between R&D and commercial operations has given me an insight into the traps that lie in our path. I present a personal view on the most common failure modes of data science projects...
Predicting the Presidential Election
With the presidential election less than a week out, I thought it would be fun to make my own predictions about the race. There are plenty of blog and websites that forecast the election, but there aren't that many that tell you how exactly their "secret models" work OR show you how to do it yourself. Well good news is that's exactly what I'm going to do ;)...
A Message from this week's Sponsor:
Harness the business power of big data.
How far could you go with the right experience and education? Find out. At Capitol Technology University. Earn your PhD Management & Decision Sciences — in as little as three years — in convenient online classes. Banking, healthcare, energy and business all rely on insightful analysis. And business analytics spending will grow to $89.6 billion in 2018. This is a tremendous opportunity — and Capitol’s PhD program will prepare you for it. Learn more now!
Data Science Articles & Videos
5 Simple Math Problems No One Can Solve
Mathematics can get pretty complicated. Fortunately, not all math problems need to be inscrutable. Here are five current problems in the field of mathematics that anyone can understand, but nobody has been able to solve...
Building a (semi) Autonomous Drone with Python
They might not be delivering our mail (or our burritos--tacocopter.com) yet but drones are now simple, small, and affordable enough that they can be considered a toy. You can even customize and program some of them! The Parrot AR Drone has an API that let's you control not only the drone's movement but also stream video and images from both of its cameras. I'll show you how you can use Python and node.js to build a drone that moves all by itself...
Neural Enhance
As seen on TV! What if you could increase the resolution of your photos using technology from CSI laboratories? Thanks to deep learning and #NeuralEnhance, it's now possible to train a neural network to zoom in to your images at 2x or even 4x...
Learning Scalable Deep Kernels with Recurrent Structure
Many applications in speech, robotics, finance, and biology deal with sequential data, where ordering matters and recurrent structures are common. However, this structure cannot be easily captured by standard kernel functions. To model such structure, we propose expressive closed-form kernel functions for Gaussian processes...
Once Again: Prefer Confidence Intervals to Point Estimates
Today I saw a claim being made on Twitter that 17% of Jill Stein supporters in Louisiana are also David Duke supporters. For anyone familiar with US politics, this claim is a priori implausible, although certainly not impossible. Given how non-credible this claim struck me as being, I decided to look into the origin of this number of 17%...
Identify and monitor social/historical cues for short term stock movement
Using stock historical data, train a supervised learning algorithm with any combination of financial indicators. Rapidly backtest your model for accuracy and simulate investment portfolio performance...
State and National Poll Aggregation
This is a Stan implementation of Drew Linzer’s dynamic Bayesian election forecasting model, with some tweaks to incorporate national poll data, pollster house effects, correlated priors on state-by-state election results and correlated polling errors...
Neural Machine Translation in Linear Time
New neural net for Language and Machine Translation! Fast and simple way of capturing very long range dependencies...
Jobs
Data-Driven Content Developer - Sensor Tower - San Francisco, CA Drawing upon the rich data and powerful analysis capabilities of our market intelligence products for the mobile app ecosystem, Sensor Tower’s content developers are part market analyst, part storyteller, and part data evangelist. We’re seeking candidates who share our desire to provide incisive analysis of emerging app trends and key events to industry experts and global media organizations for our growing mobile insights team. If a role that blends data analysis with creating insightful articles, comprehensive reports, and beautiful visualizations sounds like it was made for you, we’d love to talk...
Training & Resources
tsfresh: Automatic extraction of relevant features from time series
This repository contains the TSFRESH python package. The abbreviation stands for "Time Series Feature extraction based on scalable hypothesis tests". The package contains many feature extraction methods and a robust feature selection algorithm...
Shepherding Random Numbers
The following is a tiny guide to shepherding random numbers. Originally I used this as a part of a presentation, but it seemed like it would work well as a little text as well. It does not really deal with statistics or probability, it is simply a collection of a few useful tricks I've learnt for manipulating random numbers...
Curated Free Learning Resources
Curated list of FREE Stats, Data Science and Computer Science lectures separated by category and difficulty level (hover over Free Lectures at top of page)...
Books
What is a p-value anyway?
34 Stories to Help You Actually Understand Statistics "Offers a fun introduction to the fundamental principles of statistics, presenting the essential concepts in thirty-four brief, enjoyable stories"...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian