Data Science Weekly - Issue 106
Issue #106 December 3 2015
Editor Picks
Gadget Can Tell What’s Wrong with Your Air Conditioner by Listening to It
Augury’s gadget and iPhone app pay attention to ultrasonic sounds and vibrations to figure out what’s wrong with air conditioners and other big machines...
Inside the Hack Rod - The World's First AI Designed Car
The brainchild of the Primodrial Research Project, this car is based on billions of data points plugged into generative-design software...
Deep Forger: Art Forgery Meets Deep Neural Nets
The past year has seen deep learning make exceptional advances in imaging, perhaps most notably with Google's Deep Dream. See how a clever Twitter bot employs deep neural nets to paint images in the style of famous painters...
A Message from this week's Sponsor:
[WEBINAR] Create Richly Interactive Visualizations with Open Source
Learn how to make richly interactive data visualizations for your open data science project with Anaconda and Bokeh. The webinar will be presented by Peter Wang, CTO & Co-founder of Continuum Analytics. Peter is the creator of Bokeh, the interactive visualization framework.
Join Us for the Webcast on December 15th
Data Science Articles & Videos
Data-mined photos document 100 years of (forced) smiling
By studying nearly 38,000 high-school yearbook photos taken since 1905, UC Berkeley researchers have shown just how much smiling, fashion and hairstyles have changed over the years. The goal was not just to track trends, but figure out how to apply modern data-mining techniques and machine learning to a much older medium: photographs...
Black Boxes and Unicorns - DataRobot CEO Jeremy Achin
Jeremy Achin, CEO at DataRobot, presented at FirstMark's Data Driven NYC on November 23, 2015. Achin discussed technique-agnostic ways to assess and interpret predictive models...
Wikipedia Bets On AI To Rebuild Editor Ranks
Wikipedia will leverage a new machine learning service called ORES to automate reviews of revisions to flag ones that are problematic and make it easier for human editors to get their proposed revisions approved...
Is Data Science The New Snake Oil?
Web Summit 2015 talk from Vitaly Gordon - Director of Data Science at Salesforce...
On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models
This paper addresses the general problem of reinforcement learning (RL) in partially observable environments. In 2013, our large RL recurrent neural networks (RNNs) learned from scratch to drive simulated cars from high-dimensional video input. However, real brains are more powerful in many ways. In particular, they learn a predictive model of their initially unknown environment, and somehow use it for abstract (e.g., hierarchical) planning and reasoning. Guided by algorithmic information theory, we describe RNN-based AIs (RNNAIs) designed to do the same...
Simple end-to-end TensorFlow examples
I flew from Austin to Washington DC last week, and the morning before my flight I downloaded TensorFlow, made sure everything compiled, downloaded the necessary datasets, and opened up a bunch of tabs with TensorFlow tutorials. My goal was, while on the airplane, to run the tutorials, get a feel for the flow of TensorFlow, and then implement my own networks for doing some made-up classification problems. I came away from the exercise extremely pleased. This post explains what I did and gives pointers to the code to make it happen...
Is Bayesian A/B Testing Immune to Peeking? Not Exactly
Since I joined Stack Exchange as a Data Scientist in June, one of my first projects has been reconsidering the A/B testing system used to evaluate new features and changes to the site. Our current approach relies on computing a p-value to measure our confidence in a new feature. Unfortunately, this leads to a common pitfall...
Regularizing RNNs by Stabilizing Activations
We stabilize the activations of Recurrent Neural Networks (RNNs) by penalizing the squared distance between successive hidden states' norms. This penalty term is an effective regularizer for RNNs including LSTMs and IRNNs, improving performance on character-level language modelling and phoneme recognition, and outperforming weight noise...
Machine Intelligence In The Real World
I’ve been laser-focused on machine intelligence in the past few years. I’ve talked to hundreds of entrepreneurs, researchers and investors about helping machines make us smarter. On average, people seem most concerned about how to interact with these technologies once they are out in the wild. This post will focus on how these companies go to market, not on the methods they use....
5 Common Pitfalls To Avoid When Crafting Your Data Science Resume
You're spending hours on your resume and still have no job interviews to show for it. You've been having trouble landing Data Science job interviews and think your resume may be hurting you...
Jobs
Data Scientist, Smart Pricing - Walmart eCommerce - Sunnyvale, CA We are a highly motivated group of Big Data engineers, Data Scientists and Applications Engineers, working in small agile groups to solve sophisticated and high impact problems. We are building systems that ingest, model and analyze massive flow of data from online, social, mobile and offline commerce/user activity to set key business attributes for millions of products in real time. We use cutting edge machine learning, data mining and optimization algorithms underneath it all to analyze all this data on top of Hadoop/HBase/Hive. Your work will be immediately visible to millions of people and you will have a direct impact on the business goals of Fortune #1 company. If you talk, speak and think data we want to talk to you. Come join our small team and be part of this exciting journey...
Training & Resources
awesome-nlp
A curated list of resources dedicated to Natural Language Processing...
U.S. Government’s open data
Here you will find data, tools, and resources to conduct research, develop web and mobile applications, design data visualizations, and more - over 188K datasets...
Beginner's Guide to Click-Through Rate Prediction with Logistic Regression
Let's say that you're a major search engine, and you need to decide which ad to display at the top of your search results. How would you do it?...
Books
Python Data Science Cookbook New release and getting very good reviews...
"This book gives a very practical approach to learn some of the important algorithms using Python. Great hands on experience. I am sure once you do the examples in this book, most of the fundamental concepts will be understood..."
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian