Data Science Weekly - Issue 73
Issue #73 April 16 2015
Editor Picks
Reddit AMA with Andrew Ng (Baidu)
Dr. Andrew Ng is Chief Scientist at Baidu. He leads Baidu Research, which includes the Silicon Valley AI Lab, the Institute of Deep Learning and the Big Data Lab. The organization brings together global research talent to work on fundamental technologies in areas such as image recognition and image-based search, speech recognition, and semantic intelligence...
How Airbnb uses machine learning to detect host preferences
What started as a small research project resulted in the development of a machine learning model that learns our hosts’ preferences for accommodation requests based on their past behavior. For each search query that a guest enters on Airbnb’s search engine, our model computes the likelihood that relevant hosts will want to accommodate the guest’s request...
Can Amazon Make Machine Learning More Accessible?
First we had IBM Watson Analytics, then Microsoft launched Azure Machine Learning. With last week’s launch of Amazon Machine Learning, the e-commerce giant is the latest tech giant to attempt to democratize the development and use of machine learning models and make the technology useful to people who aren’t data scientists...
A Message from this week's Sponsor
Want to be a Data Scientist, but don't know where to start?
Learn essential Data Science skills in SlideRule's Intro to Data Science Workshop. In this online bootcamp, you'll learn R, data wrangling, analytics and visualization by working on real projects, with 1-on-1 mentorship from expert Data Scientists from LinkedIn, Glassdoor, Trulia and Stripe.
Spots are limited; registration ends in 48 hours!
Data Science Articles & Videos
Machine Learning Algorithm Mines 16 Billion E-Mails
Human e-mailing behavior is so predictable that computer scientists have created an algorithm that can calculate when an e-mail thread is about to end...
Building a Client-side Blog Search Algorithm
A few months ago we noticed that our blog was getting really hard to navigate. One of the most frequent requests we would get from people was the ability to search for posts. Sounds a bit obvious but we were all a bit surprised. Searching for posts?!? How many posts can we possibly have? Do we really have that many that people need to be able to search for them?...
Topic modeling on beer reviews
The purpose of this project is to investigate topic modeling in multi-aspect reviews. More specifically, I wanted to investigate a way to find the words in reviews which were associated with the different categories being rated. Since I, like seemingly all data sciencists, love beer, I was thrilled to find a dataset containing about 1.5 million beer reviews from the beeradvocate website. Below is a summary of my workflow and findings in playing around with this dataset...
Chief Data Officer Role Shifts to Offense
Many organizations are redefining the chief data officer role from one solely concerned with data governance to a role responsible for helping generate new information-based products and services...
A comparison of open source tools for sentiment analysis
The objective of this project is to apply various NLP sentiment analysis techniques on reviews of the Yelp Dataset and assess their effectivenes on correctly identifiyng them as positive or negative...
Extracting Structured Data From Recipes Using Conditional Random Fields
NYT Cooking launched last fall with over 17,000 recipes that users can search, save, rate and (coming soon!) comment on. The product was designed and built from scratch over the course of a year, but it relies heavily on nearly six years of effort to clean, catalogue and structure our massive recipe archive...
Deep Learning vs Probabilistic Graphical Models vs Logic
Let's take a look back Logic and Probabilistic Graphical Models and make some predictions on where the field of AI and Machine Learning is likely to go in the near future. We will proceed in chronological order...
Scientists develop algorithm that can auto-ban internet trolls
Researchers at Cornell University claim to be able to identify internet trolls with more than 80% AUC*, positing the possibility of creating automated methods to identify or even auto-ban forum and comment-thread pests...
Becoming a Full-Stack Statistician in 6 Easy Steps
It's been a fun challenge to go from being an academic statistician to a practicing data scientist deep in the trenches of the software industry. Here are a few essential skills that I have had to pick up along the way. Remember, to become a full-stack statistician, try to be as fast as possible in each of the following categories...
Jobs
Data Scientist - Custora - New York, NY Imagine you had a limitless amount of data for every single customer for a major retailer: every product purchased online and in-store, every email opened and clicked, every product viewed on the website, every advertisement impression viewed, and more. With all that data at your fingertips, how could improve the retailer's marketing efforts?...
Training & Resources
Jake VanderPlas - Machine Learning with Scikit-Learn (I) - PyCon 2015
This tutorial will offer an introduction to the core concepts of machine learning and the Scikit-Learn package. We will introduce the scikit-learn API, and use it to explore the basic categories of machine learning problems and related topics such as feature selection and model validation, and practice applying these tools to real-world data sets...
Research papers that changed the world of Big Data
If you are looking for some of the most influential research papers that revolutionised the way how we gather, aggregate, analyze and store increasing volumes of data in a short span of 10 years, you are in the right place!...
Advice for applying Machine Learning
This post is based on a tutorial given in a machine learning course at University of Bremen. It summarizes some recommendations on how to get started with machine learning on a new problem...
Books
Learn R in a Day Clear and efficient way to get up and running in R......
"I was delighted with this little book...it got me functional with R, able to enter, manipulate, and plot data usefully in less than 8 hours of work..."
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Enjoyed the newsletter? Please forward it along to friends and colleagues - we'd love to have them onboard! - All the best, Hannah & Sebastian