Data Science Weekly - Issue 55
Issue #55 Dec 11 2014
Special Message: We would like to make a shameless plug for a new book, authored by our very own Sebastian Gutierrez - Data Scientists At Work. More details in the "Book" section, though rest assured it is the perfect holiday read ;)
Editor Picks
The Current State of Machine Intelligence
I spent the last three months learning about every artificial intelligence, machine learning, or data related startup I could find — my current list has 2,529 of them to be exact. Yes, I should find better things to do with my evenings and weekends but until then...
Simulating Decisions to Improve Them
One of the jobs of the Data Science team is to help zulily make better decisions through data. One way that manifests itself is via experimentation. Like most ecommerce sites, zulily continuously runs experiments to improve the customer experience...
Deep Neural Networks are Easily Fooled:
High Confidence Predictions for Unrecognizable Images
Deep Neural Networks (DNNs) have recently been achieving state-of-the-art performance on a variety of pattern-recognition tasks, most notably visual classification. Given that DNNs are now able to classify objects in images with near-human-level performance, questions naturally arise as to what differences remain between computer and human vision...
Data Science Articles & Videos
Neural Representation of Language Learning - Tom Mitchell
How does the human brain use neural activity to create and represent meanings of words, sentences and stories?...
Cyrus Vance Jr.’s ‘Moneyball’ Approach to Crime
A glance at New York City crime statistics might lead you to conclude that Cyrus Vance Jr., the district attorney of New York County, no longer works in what William Travers Jerome, who held the job more than a century ago, once called “the mouth of hell.” ...
Experiments at Airbnb
While the basic principles behind controlled experiments are relatively straightforward, using experiments in a complex online ecosystem like Airbnb during fast-paced product development can lead to a number of common pitfalls. Some, like stopping an experiment too soon, are relevant to most experiments. Others, like the issue of introducing bias on a marketplace level, start becoming relevant for a more specialized application like Airbnb. We hope that by sharing the pitfalls we’ve experienced and learned to avoid, we can help you to design and conduct better, more reliable experiments for your own application...
Where Are Data Science Jobs Located?
You want to get a data science job, but do you have to move to the San Francisco area to get it?...
Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?
We evaluate 179 classifiers arising from 17 families, implemented in Weka, R (with and without the caret package), C and Matlab, including all the relevant classifiers available today...
The wonderful and terrifying implications of computers that can learn
TEDx Talk by Jeremy Howard, CEO of Enlitic, which uses recent advances in machine learning to make medical diagnostics faster, more accurate, and more accessible...
Neural Networks Demystified [Part 4: Backpropagation]
Backpropagation as simple as possible, but no simpler. Perhaps the most misunderstood part of neural networks, Backpropagation of errors is the key step that allows ANNs to learn. In this video, I give the derivation and thought processes behind backpropagation using high school level calculus. ...
Using Data for a More Transparent Government
Our team at the Data Science for Social Good Fellowship, in collaboration with the Harris School of Public Policy, developed an automated system that uses machine-learning methods to identify earmarks in congressional documents. Using this approach, we construct the first publicly available database of earmarks that covers every year back to 1995....
The Doom that Came to Puppet
Posts generated by a Markov chain trained on the Puppet documentation and the assorted works of H. P. Lovecraft...
Jobs
Senior Data Scientist - Shutterstock, New York, NY As a Senior Data Scientist, you will be joining the team responsible for pushing technology boundaries in areas such as language translation, image recognition, natural language processing, and search ranking. Your work will directly empower the Shutterstock customer experience seen by millions of customers daily, and will enable new and unique customer features that drive Shutterstock's best in-class image and video search engine...
Training & Resources
Practical Data Science in Python
This notebook accompanies my talk on "Data Science with Python". The goal of this talk is to demonstrate some high level, introductory concepts behind (text) machine learning. The concepts are accompanied by concrete code examples in this notebook, which you can run yourself (after installing IPython, see below), on your own computer...
A Tutorial on Principal Component Analysis
Principal component analysis (PCA) is a mainstay of modern data analysis - a black box that is widely used but (sometimes) poorly understood. The goal of this paper is to dispel the magic behind this black box...
ScienceOps: Public Sandbox
A platform for deploying, managing, and scaling predictive models in production applications...
Books
Data Scientists at Work JUST RELEASED: A collection of interviews with sixteen of the world's most influential and innovative data scientists from across the spectrum of this hot new profession...
"In this book, you will see how some of the world's top data scientists work across a dizzyingly wide variety of industries and applications – each leveraging their own blend of domain expertise, statistics, and computer science to create tremendous value and impact..."
- Peter Norvig, Director of Research, Google
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Did you check out the book? It would make our day (and frankly our year!) if you'd take a quick look and help spread the word :-) - All the best, H & S