Data Science Weekly - Issue 28
Issue #28 June 5 2014
Editor Picks
Machine Learning as a Service:
Making Sentiment Predictions in Realtime with ZMQ and NLTK I am a Machine Learning (ML) and Natural Language Processing enthusiast. For my university dissertation I created a realtime sentiment analysis classifier for Twitter. My talk is about the experience and the lessons learned... showing how easy it can be to build a ML SaaS by using some of the amazing libraries such as NLTK, ZMQ and MrJob that have helped me...
The Internet in Real-Time Amazing dynamic visual on how quickly data is generated...
Realtime Predictive Analytics using scikit-learn & RabbitMQ Scikit-learn is an awesome tool allowing developers with little or no machine learning knowledge to predict the future! But once you’ve trained a scikit-learn algorithm, what now? In this talk, I describe how to deploy a predictive model in a production environment using scikit-learn and RabbitMQ. You’ll see a realtime content classification system to demonstrate this design...
Data Science Articles & Videos
A Growing Number of Applications are being built with Spark
The number of companies that are using (or plan to use) Spark in production1 has exploded over the last year. The surge in popularity of the Apache Spark ecosystem stems from the maturation of its individual open source components and the growing community of users...
Convolutional Network Demo from 1993, featuring Yann LeCun
This is a demo of "LeNet 1", the first convolutional network that could recognize handwritten digits with good speed and accuracy. It was developed between 1988 and 1993 in the Adaptive System Research Department, headed by Larry Jackel, at Bell Labs in Holmdel, NJ...
On the Importance of Text Analysis for Stock Price Prediction
We investigate the importance of text analysis for stock price prediction. In particular, we introduce a system that forecasts companies’ stock price changes (UP, DOWN, STAY) in response to financial events reported in 8-K documents. Our results indicate that using text boosts prediction accuracy over 10% (relative) over a strong baseline that incorporates many financially-rooted features...
MITx and HarvardX Release De-Identified Dataset from First Year of MOOCs
I'm pleased to announce today that my colleagues at HarvardX and MITx have released a de-identified person-course dataset from 16 courses from the first year of edX; the same dataset that was used to produce HarvardX and MITx: The First Year of Open Online Courses...
Yann LeCun's answers from the Reddit AMA
On May 15th Yann LeCun answered “ask me anything” questions on Reddit. We hand-picked some of his thoughts and grouped them by topic for your enjoyment...
Bandits for Recommendation Systems
In this blog post, we will discuss the bandit problem and how it relates to online recommender systems. Then, we'll cover some classic algorithms and see how well they do in simulation...
Optimizing QR Decomposition of Tridiagonal Matrices in Julia
I recently implemented low-rank matrix approximations (in Matlab) for my numerical linear algebra class, so I figured reimplementing part of the algorithm in Julia would be a good way to get my feet on the ground in Julia land...
Statistical Language Wars: The Infograph
A feature all programming communities have in common is the numerous debates about why their programming language of choice is better, more advanced, faster, holier etc. In today’s data science community, it seems like these discussions are omnipresent with advocates of SAS, SPSS, R, Python, Julia, etc. battling and challenging each other on every online medium...
Everything You Wanted to Know about the Kernel Trick
The goal of this writeup is to provide a high-level introduction to the "Kernel Trick" commonly used in classification algorithms such as Support Vector Machines (SVM) and Logistic Regression. My target audience are those who have had some basic experience with machine learning, yet are looking for an alternative introduction to kernel methods...
Jobs
Senior Data Scientist - The Weather Company, Madison WI Are you interested in applying machine learning or data mining on problems that truly improve people’s life? We’re looking for a mathematician/data scientist eager to tackle unique challenges in the realm of predicting weather’s impact on business. You will work on a skilled team of passionate data scientists and meteorologists. Examples of projects you may encounter would be anything from predicting the electricity output of a solar park in Arizona, to predicting how much ice cream is going to be sold next week in Chicago...
Training & Resources
5 Great Resources For Learning Linear Algebra
List of helpful resources on the subject...
Deep Learning Draft of a book on Deep Learning by Yoshua Bengio, Ian Goodfellow, and Aaron Courville...
Step-by-Step Guide to Setting Up an R-Hadoop System
This is a step-by-step guide to setting up an R-Hadoop system. I have tested it both on a single computer and on a cluster of computers...
Books
Outlier Detection for Temporal Data Just released!...
"Outlier Detection for Temporal Data covers topics in temporal outlier detection, which have applications in numerous fields. It starts with the basic topics then moves on to state of the art techniques in the field."
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Did you enjoy the newsletter? Do you have friends/colleagues who might like it too? If so, please forward it along - we would love to have them onboard :)