Data Science Weekly - Issue 15
Issue #15 March 6 2014
Editor Picks
Machine Learning in 10 Pictures I find myself coming back to the same few pictures when explaining basic machine learning concepts. Below is a list I find most illuminating...
Using CART for Stock Market Forecasting Most of the time literature on market forecasting mixes two market features: Magnitude and Direction. In this article I focus on identifying the market direction only... market conditions when the odds are significantly biased toward an up or a down market. This post gives an example of how CART (Classification And Regression Trees) can be used in this context...
Why Apache Spark is a Crossover Hit for Data Scientists While discussion about Spark for data science has mostly noted its ability to keep data resident in memory... this is perhaps not even the big news, not to me. It does not solve every problem for everyone. However, Spark has a number of features that make it a compelling crossover platform for investigative as well as operational analytics...
Data Science Articles & Videos
Automated Decision-Making: Machines don't know it all, but they're learning
For some people, being recognized as the top gun in your line of work only to find yourself looking for a new line of work a month later might trigger a come-to-Jesus moment - George Karl isn't "some people."...
Big Data correctly predicts winners at the Oscars
Guessing the Oscar winners is a fun past-time for many movie fans and industry insiders. They are all thrilled when they guess right. It's all part of the fun of Oscar night. But last year, big data stole the show when Farsite analysts correctly predicted five out of the six winners and they were ready to do it again. This year, six out of six predictions were correct...
ETL: The most important Acronym you've never heard of
As impression volumes rise into trillions across all manner of devices, the focus of many ad tech engineering teams isn’t on ethereal ML algorithms, but something far less glamorous. The process is called ETL — the critical, painstaking work of cleansing and consolidating disparate datasets...
After building a powerful Recommendation System for Netflix, this Guy wants to help you find your next favorite book
Nicholas Ampazis builds software that makes recommendations for you. Where it was once making movie recommendations on Netflix, he's turning it to the book world...
Bayesian Bandits testing for mobile apps
While A/B testing is a competent tool in evaluating variants for a simple process – for example, the best-converting variant of an e-commerce landing page that isn't likely to change in the future relative to the rest of the site – it's perhaps not well-suited to dynamic mobile apps operating as services. An alternative to A/B testing is Bayesian Bandits testing...
AMA with Yoshua Bengio - Transcript & Comments
Yoshua Bengio is one of the ML professors who led the deep learning renaissance of 2006, along with Geoff Hinton and Yann LeCun (and one of the last deep learning professors to remain completely in academia). This post features the top 200 comments/responses on his recent reddit AMA...
Birds, Bees and Big Data: How One App helped 50,000 Women get Pregnant
Almost everyone over the age of 12 knows how babies are made. But big data is changing even that in a big way. At the very least, it’s making the process of getting pregnant a lot more predictable...
Is Julia the Future for Big Data Analytics?
In many Big Data blogs, meetups and in the halls of the most recent O’Reilly Strata Conference, one of the most-discussed topics is which language is better for data analysis: Python or R. Some of the talk has even reached “religious” overtones not unlike previous discussions on Windows vs. Linux or Microsoft’s Internet Explorer vs. Mozilla Firefox. So what’s the issue here?...
Rendering scikit Decision Trees in D3.js
Scikit-learn provides routines to export decision trees to a format called Graphviz, although typically this is used to provide an image of a chart. For some applications this is valuable, but if the product of machine learning is a the ability to generate models (rather than predictions), it would be preferable to provide interactive models...
Jobs
Data Scientist - Microsoft, Redmond, WA Join the excitement of Machine Learning in the Cloud at Microsoft! We are a fast paced data science team in the Microsoft Cloud + Enterprise organization building machine learning powered intelligent web services and end to end solutions for scenarios in diverse enterprise and consumer verticals. We are looking for applied scientists who are passionate about applying machine learning and data mining techniques to a variety of exciting applications for enterprises and consumers...
Training & Resources
Vowpal_Wabbit: The Redis of the Data Science Community
vowpal_wabbit, or vw, is an online learning program originally built by Yahoo! Research (now Microsoft Research) . It's fairly basic to use, it's a command line tool and it's mostly written in C++. Even the website has a great Web 1.0 feel to it. Using vw basically maxes out your data science style points. It's like not wearing a mask in hockey, or having lift tickets from 3 foreign countries on your ski jacket. Yeah, it's that cool...
Top 22 Free Online Courses In Computer Science And Artificial Intelligence! Go on and make a career out of computer science and artificial intelligence! Here are the resources that will teach you all the tricks of the trade!...
Classification with scikit-learn
This post looks into the problem of classification, a situation in which a response is a categorical variable. We will build upon the techniques that we previously discussed in the context of regression and show how they can be transferred to classification problems...
Domino - Data Analysis Accelerated Easily run R, Python, and Matlab code in the cloud. Automatic version control and collaboration for data, code, and results...
P.S. Did you enjoy the newsletter? Do you have friends/colleagues who might like it too? If so, please forward it along - we would love to have them onboard :)