Data Science Weekly - Issue 44
Issue #44 Sept 25 2014
SPECIAL NOTICE: FREE 2 DAY PASS for Strata Conference + Hadoop World
October 15–17, 2014 | New York, NY
Strata + Hadoop World is where cutting-edge data science and new business fundamentals intersect—and merge. Strata brings together the decision makers using big data to drive business strategy and practitioners who collect and analyze the data. Combined with Hadoop World, the joint event is also the largest gathering of the Apache Hadoop community in the world.
We have ONE ticket to giveaway so just hit reply and tell us
"which session excites you most" to enter
Editor Picks
Style in the Long Tail: Discovering Unique Interests with Latent Variable Models in Large Scale Social E-commerce [winner KDD best industry paper]
At Etsy, an online marketplace for handmade and vintage goods with over 30 million diverse listings, the problem of capturing taste is particularly important – users come to the site specifically to find items that match their eclectic styles. In this paper, we describe our methods and experiments for deploying two new style-based recommender systems on the Etsy site...
Predicting NYC Taxi Tips
After cleaning and getting a sample from the original dataset, it's possible to predict, with an accuracy of 71.74%, if the tip of a trip in a NYC taxi is going to be less than 20% or greater than or equal to 20% of the charge, without the possibility to use information about the passengers...
Detecting Anomalies with Neural Networks
The focus of this work was finding outliers in the gas consumption of some buildings...
Data Science Articles & Videos
Gilt's pre-emptive shipping is an experiment in data science
In this Q&A, Igor Elbert describes Gilt's pre-emptive shipping program, which leans on his expertise in machine learning and predictive modeling...
From Reducing Friendly Fire to Analyzing Social Data: Joseph Misiti Interview
We recently caught up with Joseph Misiti, co-founder of Math & Pencil, SocialQ and more! We were keen to learn more about his background, his work at SocialQ, and thoughts on how Data Science is evolving. Also, given his thought-provoking article "Why becoming a data scientist is NOT actually easier than you think", we were keen to garner his advice on how best to enter the field...
How to use R, H2O and Domino for a Kaggle Competition
This post is a proper machine learning case study based on a recent Kaggle competition: I am leveraging R, H2O and Domino to compete (and do pretty well) in a real-world data mining contest....
Political Ideology Detection Using Recursive Neural Networks
Taking inspiration from recent work in sentiment analysis that successfully models the compositional aspect of language, we apply a recursive neural network (RNN) framework to the task of identifying the political position evinced by a sentence...
Implementation of Neural Tensor Network
This is an implementation of Neural Tensor Network, as mentioned in the following research paper...
Comparing machine learning models in R
While preparing for the DataWeek R Bootcamp that I conducted this week I came across the following gem. This code, based directly on a Max Kuhn presentation of a couple years back, compares the efficacy of two machine learning models on a training data set...
Multicore LDA in Python: from over-night to over-lunch
Latent Dirichlet Allocation (LDA), one of the most used modules in gensim, has received a major performance revamp recently. Using all your machine cores at once now, chances are the new LdaMulticore class is limited by the speed you can feed it input data. Make sure your CPU fans are in working order!...
Anomaly Detection: A Survey
Anomaly detection is an important problem that has been researched within diverse research areas and application domains. Many anomaly detection techniques have been specifically developed for certain application domains, while others are more generic. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection...
How To Build and Use a Multi GPU System for Deep Learning
When I started using GPUs for deep learning my deep learning skills improved quickly...
Jobs
Data Scientist - Birchbox - NYC Birchbox is looking for a data scientist who is a great software engineer – or a great software engineer who really knows statistics and machine learning – to work on data-driven tasks...
Training & Resources
Machine Learning for (Smart) Dummies
One of the benefits of the open academic collaborations that Yahoo Labs encourages, including mine, is the knowledge transfer each party brings to the table. It is in the same spirit of collaboration and open discourse that we are offering all of the seven classes below for your professional and/or personal enrichment...
Top 10 presentations about data science / big data on SlideShare
The topic of data science/big data is growing in popularity. We checked what caught biggest attention (measured in views) on SlideShare!...
Books
The Cartoon Guide to Statistics Covers all the central ideas of modern statistics...
"This book is exceptional in its ability to communicate difficult concepts in a light and entertaining manner..."
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Enjoyed the newsletter? Please forward it to friends and peers - we'd love to have them onboard too :-) - All the best, Hannah & Sebastian