Data Science Weekly

Oct 20, 2016

Issue #152 Oct 20 2016

Editor Picks

Statistical Numbing
We have Professor Paul Slovic from University of Oregon on the show to talk about “Statistical Numbing.” Professor Slovic is a renowned expert on the effect of numbers and statistics on empathy (or lack thereof). His fascinating, if not depressing, experiments have consistently shown how hard it is for statistics to elicit any sense of scale in human tragedies and how numbers can often even be detrimental if the goal is to elicit compassion and generous actions from an audience...

Estimating the value of a vehicle with R
We tend to think of R and other such ML tools only in the context of the workplace, to do “weighty” things aimed at saving millions. A little judicious use of R may help us hugely in our personal lives too...

Achieving Human Parity in Conversational Speech Recognition
Conversational speech recognition has served as a flagship speech recognition task since the release of the DARPA Switchboard corpus in the 1990s. In this paper, we measure the human error rate on the widely used NIST 2000 test set, and find that our latest automated system has reached human parity...

Harness the business power of big data.

How far could you go with the right experience and education? Find out. At Capitol Technology University. Earn your PhD Management & Decision Sciences — in as little as three years — in convenient online classes. Banking, healthcare, energy and business all rely on insightful analysis. And business analytics spending will grow to $89.6 billion in 2018. This is a tremendous opportunity — and Capitol’s PhD program will prepare you for it. Learn more now!

Why I’m Backing Deep Drumpf, And You Should Too
This year’s U.S. presidential election has been pretty nutty. An algorithm that spews incoherent and outrageous soundbites might not even be the worst candidate in the running. Just such an algorithm, called Deep Drumpf, has indeed entered the race. Built using a deep-learning algorithm that’s been fed the transcripts of numerous Donald Trump speeches, it automatically generates tweets that seem remarkably similar to many of those issued by the candidate himself...

Altair Examples
This repository contains some examples of visualizations using Altair...

Don’t Become a Victim of One Key Metric
The One Key Metric, North Star Metric, or One Metric That Matters has become standard operating procedure in startups as a way to manage a growing business. In principle, this solves a lot of problems. It has people chasing problems that affect user engagement instead of top line metrics that look nice for the business. I have seen it abused multiple times though, and I’ll point to a few examples of how it can go wrong...

Debate Night Twitter: Analyzing Twitter’s Reaction to the Presidential Debate
With having a near unprecedented amount of attention and hostility, I wanted to gauge Twitter’s reaction to the second debate. In this project, I streamed tweets under the hashtag #debate and analyzed them to discover trends in Twitter’s mood and how users were reacting to not just the debate overall but to certain events in the debate...

How Vector Space Mathematics Helps Machines Spot Sarcasm
Sarcasm is almost impossible for computers to spot. A mathematical approach to linguistics could change that...

DeepMind’s differentiable neural computer helps you navigate the subway with its memory
In his best-selling 2011 book Thinking, Fast and Slow, Nobel Prize-winning economist Daniel Kahneman hypothesized that thinking could be broken down into two distinct processes — aptly named fast and slow thought. The former is all about your gut, the initial automatic responses you have to things, while the later is calculated, reflective and time-consuming. A new algorithm from DeepMind is beginning to show us that so-called “slow” thinking may soon be within the reach of machine learning...

Deconvolution and Checkerboard Artifacts
When we look very closely at images generated by neural networks, we often see a strange checkerboard pattern of artifacts. It’s more obvious in some cases than others, but a large fraction of recent models exhibit this behavior. Mysteriously, the checkerboard pattern tends to be most prominent in images with strong colors. What’s going on? Do neural networks hate bright colors? The actual cause of these artifacts is actually remarkably simple, as is a method for avoiding them...

Data Scientist - RealScout - Mountain View, SF and Philadelphia RealScout’s goal is to provide better transparency between real estate agents, home sellers and home buyers to bring efficiency to the $1 trillion residential real estate marketplace. And one of our theses is to accomplish this through data.

A typical data scientist week at RealScout entails improving and expanding our custom computer vision model across dozens of millions of photos, conducting survival analysis to predict how long a home will stay on the market, or predicting how likely a home buyer is to make a purchase. You'll also get a chance to work on our data pipeline as well as put your models into production in front of tens of thousands of agents and home buyers...

How to Use t-SNE Effectively
Although extremely useful for visualizing high-dimensional data, t-SNE plots can sometimes be mysterious or misleading. By exploring how it behaves in simple cases, we can learn to use it more effectively...

Algobeans
Looking for digestible data science tutorials? Our promise: no math added. From traditional favorites like regression to novelties like deep learning, we offer them all using only the most intuitive ingredients...

Python For Data Science Cheat Sheet
Basics that beginners need to know in order to get started on doing data science with Python...

Statistics As Principled Argument "This is a great book. Everyone who uses statistics in any way should read it. Maybe everyone who READS articles that contain statistics should read it! The mathematics is minimal (very few formulas, and those are basic), but a lot of very good advice on how to use statistics sensibly (and how it is sometimes used nonsensically!)"...

For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.

P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian