Data Science Weekly - Issue 47
Issue #47 Oct 16 2014
Editor Picks
Defending Credit for the World’s Poor with Data Science
How Bayes Impact is reducing fraud for microfinance nonprofit Zidisha...
This Math Model Is Predicting the Ebola Outbreak with Incredible Accuracy
Part of the allure of epidemiology is being able to describe and predict highly dynamic outbreaks with simple, clean mathematical models. But how close can models really get to perfectly mapping the spread of disease?...
How do I find Wally with Python?
Inspired by How do I find Waldo with Mathematica and the followup How to find Waldo with R, as a new python user I'd love to see how this could be done. It seems that python would be better suited to this than R...
Data Science Articles & Videos
Classifying Shakespearean Drama with Sparse Feature Sets
By analyzing only the length of a play and the number of words women speak in that play, one can start to get reasonably good separation between the genres: comedies tend to be shorter and include more female dialogue, histories tend to be longer and include less female dialogue, and tragedies split provocatively between the upper right and lower left. Reviewing these figures, I can't shake the suspicion that a third dimension of data could unite these divided tragedies. But what would that dimension consist of? ...
Transforming OpenTable into a Local Dining Expert: Sudeep Das Interview
We recently caught up with Sudeep Das, Astrophysicist and Data Scientist at OpenTable. We were keen to learn more about his background, his work in academia and how he is applying data science in his new role - transforming OpenTable into a local dining expert...
Fledgling data science profession under strain, says SAS research
The fledgling data science profession is under strain, with its mostly young workers bending themselves out of shape to adapt to corporate life, research has revealed...
Music Information Retrieval Using Locality Sensitive Hashing
Music information retrieval (MIR) is an interdisciplinary field bridging the domains of statistics, signal processing, machine learning, musicology, biology, and more. In this talk, we will survey common research problems in MIR, including music fingerprinting, transcription, classification, and recommendation, and recently proposed solutions in the research literature...
Smart Autofill - Harnessing the predictive power of Machine Learning in Google Sheets
You can now use machine learning to make predictions in Google Sheets with the newly launched Smart Autofill Add-on...
A Statistician's View on Big Data and Data Science in Pharmaceutical Development
This presentation gives a professional statistician's view on these terms in pharmaceutical development, illustrates the connection between data science and statistics, and highlights some challenges and opportunities from a statistical perspective...
Hyperparameter search, Bayesian optimization and related topics
In terms of (importance divided-by glamour), hyperparameter (HP) search is probably pretty close to the top. We all hate finding hyperparameters. Default settings are usually good, but you're always left wondering: could I have done better?...
Data Mining Reveals The Secret To Matching Crowdfunding Projects To Investors
If you want to crowdfund your next project, an algorithm can match you to the most likely investor, say computer scientists...
Neural Networks (ANN) using NodeJS : Machine Learning for Web Systems
The post is for fundamental understanding on how to get started with Neural networks and build applications using NodeJS....
Visualizing MNIST: An Exploration of Dimensionality Reduction
At some fundamental level, no one understands machine learning. It isn’t a matter of things being too complicated. Almost everything we do is fundamentally very simple. Unfortunately, an innate human handicap interferes with us understanding these simple things....
Jobs
Analyst/Data Scientist, idibon - San Francisco, CA We’re looking for someone who loves analyzing (language) data and talking to customers. Our clients answers all sorts of questions using Idibon’s products and you’ll help them figure out how to get the best results. So this involves experimental design, writing scripts, running stats, making sure you understand client needs and making sure you communicate them so that the rest of the team understands them, too...
[Note: idibon, a start-up that has developed applications that make sense of text in scores of languages, has just announced Series A round of $5.5M]
Training & Resources
A Hitchhikers Guide to Data Science
This article is supposed to be a rough guide to all aspiring data scientists. It is a whistle stop tour of all the necessary steps to create a data science project, including the tools and practical tips...
In-depth introduction to machine learning in 15 hours of expert videos
In January 2014, Stanford University professors Trevor Hastie and Rob Tibshirani (authors of the legendary Elements of Statistical Learning textbook) taught an online course based on their newest textbook...
Deep Learning on the Amazon EC2 GPU using Python and nolearn
This post is full of screenshots as I document my way through setting up an Amazon EC2 GPU instance to train a Deep Belief Network using Python and nolearn...
Books
Data Science at the Command Line:
Facing the Future with Time-Tested Tools JUST RELEASED!: A hands-on guide to demonstrate how the flexibility of the command line can help you become a more efficient and productive data scientist. Learn more about the book, who its targeted at and what you can hope to learn in our interview with its author, Jeroen Janssens...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Enjoyed the newsletter? Please forward it to friends and peers - we'd love to have them onboard too :-) - All the best, Hannah & Sebastian