Data Science Weekly - Issue 13
Issue #13 February 20 2014
Editor Picks
Flappy Bird hack using Reinforcement Learning This is a hack for the popular game, Flappy Bird. After playing the game a few times, I saw the opportunity to practice my machine learning skills and try and get Flappy Bird to learn how to play the game by itself...
Why Google Is Investing In Deep Learning Google’s acquisition of DeepMind has everyone talking about deep learning. Here's what you need to know to join the conversation...
A Billion Rows per Second: Metaprogramming Python for Big Data Ville Tuulos, Principle Engineer at AdRoll, demonstrates how they use Python to squeeze every bit of performance out of a single high-end server. They manage it with Numba, a new NumPy aware dynamic Python compiler based on LLVM. Find out more in this informative talk from the SF Python Meetup...
Data Science Articles & Videos
Overspecialization throws Data Science Dream Teams Off-Balance
Building a data science team is difficult enough, but growing one without losing the team's effectiveness is a major challenge. Here's why overspecialization is the wrong approach to growth...
The Billion Dollar AI Castle in the Air
High tech companies (e.g., Microsoft, Google, FaceBook, Netflix, Intel, Amazon, etc.) are pouring billions of dollars into a branch of artificial intelligence called machine learning. Below, I argue that, in spite of their initial successes, current approaches to machine learning will fail primarily because this is not the way the brain works...
This Algorithm can Predict a Revolution
For students of international conflict, 2013 provided plenty to examine. There was civil war in Syria, ethnic violence in China, and riots to the point of revolution in the Ukraine. For those working at Duke University’s Ward Lab, all specialists in predicting conflict, the year looks like a betting sheet, full of predictions that worked and others that didn’t pan out...
How to Speed up a Python Program 114,000 times
Optimizations are one thing -- making a serious data collection program run 114,000 times faster is another thing entirely. Leaning on 30+ years of programming experience, David Schachter goes over all the optimizations he made to his (secret) company's data-collecting program to get such massive performance gains. In doing so, he might be able to teach you a thing or two about optimizing a python program...
Cray Discovers a Viable Approach to Hadoop in Big Data Science
Hadoop is certainly well known as a general framework for Big Data analytics but many have questioned whether it is suited for Scientific Big Data. We caught up with Mike Boros, Hadoop Product Manager at Cray, to learn about the company’s solution for this quandary...
How does LinkedIn's Recommendation System work?
Ever since I studied Machine Learning and Data Mining at Stanford 3 years ago, I have been enamored by the idea that it is now possible to write programs that can sift through TBS of data to recommend useful things. So here I am with my colleague Adil Aijaz, for a talk on some of the lessons we learnt and challenges we faced in building large-scale recommender system...
Spectral Clustering: Intuition and Implementation
Clustering is an important task that pertains to many areas. Spectral clustering is one clustering method. We will present some intuition on what it is, then go into a high level overview of the algorithm with experimental results, along with psuedocode and implementation detail This is meant to be a lighthearted overview...
Scalability of Statistical Procedures: Why the p-value bashers just don't get it
The P-value is in the news again. Nature came out with a piece talking about how scientists are naive about the use of P-values among other things. The problem is not that people use P-values poorly it is that the vast majority of data analysis is performed by people who are not properly trained...
Conditional Probability: A Visual Exploration
A simple but effective explanation and visualization of how conditional probability works...
Jobs
Data Scientist/Quantitative Analyst, YouTube - San Bruno CA At YouTube, data drives the way we make decisions. As a Data Scientist, you should be experienced with and passionate about using data to drive strategy and product recommendations. You are able to both engage with senior leaders to design well-constructed analyses and work cross-functionally with analysts, product managers and engineers to effectively deliver actionable results. The ideal candidate is an independent, solution-oriented thinker with a strong background processing huge data sets, applying analytical rigor and statistical methods, and driving toward insights and solutions...
Training & Resources
Strata: Making Data Work - Speaker Slides & Videos Comprehensive set of presentations from the recent Strata conference...
Machine Learning and Probabilistic Graphical Models Course The course covers the necessary theory, principles and algorithms for machine learning. Following are course topics with pointers to lecture overhead slides and some lecture video files...
14 Ebooks For Learning AI And Robotics
Here are 15 ebooks that will give you a clearer picture on what is happening in AI (with most of the books available for free!)...
Deep Learning Pioneer, Yoshua Bengio doing an AMA on Feb. 24
Université de Montréal Professor Yoshua Bengio will be visiting /r/MachineLearning for an AMA (Ask Me Anything) on Feb 24 at 1 PM EST...
P.S. Did you enjoy the newsletter? Do you have friends/colleagues who might like it too? If so, please forward it along - we would love to have them onboard :)