Data Science Weekly - Issue 21
Issue #21 April 17 2014
Editor Picks
Quantum Machine Learning: Seth Lloyd (MIT) talk at Google Quantum AI Lab Machine learning algorithms find patterns in big data sets. This talk presents quantum machine learning algorithms that give exponential speed-ups over their best existing classical counterparts...
Data Workflows for Machine Learning In this in-depth video, we compare/contrast several open source frameworks which have emerged for Machine Learning workflows, including KNIME, IPython Notebook and related Py libraries, Cascading, Cascalog, Scalding, Summingbird, Spark/MLbase, MBrace on .NET, etc. ...
AI Developers to power New Generation of Context-Driven AI Spurred by recent advances in machine learning and AI, context-aware intelligent assistants represent the new frontier of content search and discovery. Companies leverage unstructured data—things like photographs, videos, chat logs, documents—to make better, more informed business decisions to automate processes. Now leveraging humanlike capabilities inside automated workflows to augment what’s possible in business and humanity...
Data Science Articles & Videos
Deep Learning (or not): The why's have it
Deep Learning is the big thing in Machine Learning right now, it just netted a win in the Galaxy Zoo Kaggle contest and it is the thing being talked about by major media outlets. The technique rightfully deserves serious attention as it has proven effective in a large number of tasks, but I see one key problem with it that is similar to but actually worse than its cousin Neural Networks; it is a black box that cannot explain "why" at even the highest level...
Huge New York Development Project Becomes a Data Science Lab
Hudson Yards is a huge estate development project, the largest in New York since Rockefeller Center...the sprawling development will also become an urban laboratory for data science... a “quantified community”...
Netflix Reveals All (well, at least a lot)
The Netflix content team is tasked with the challenge of licensing/ purchasing/ developing the best TV and movies for its 44 million users in 41 countries. This talk covered an overview of what the content data science teams do for the organization towards the goals of identifying characteristics of an “ideal” content library, predicting demand for titles that Netflix does not have, determining the customer impact of adding or losing sets of content, and helping to identify the next original series...
Hadoop's rise: Why you don't need petabytes for a Big Data opening
People are often hung up on the volume aspect of big data but other factors can be just as telling in the issues they raise for business...
Devising Our Data Destiny
The Hadoop ecosystem is becoming the incumbent data platform. It brings powerful new capabilities for collecting and analyzing data. This technology can be used for both harm and benefit. As a society, we should deliberately address this potential, developing pragmatic approaches...
Using Machine Learning To Pick Your Lottery Numbers
It is well known that people are not creative when they choose their lottery numbers : indeed, they pick their birth dates, draw some particular shapes on the grid (lines, cross, ...), etc. The goal of this notebook is to explore if Machine Learning can help us to discriminate "human generated" combinations, from "machine generated" (a.k.a. random) combinations...
Music Discovery at Spotify
Presentation about Spotify's music discovery system at MLConf NYC 2014...
Quantitative Finance Applications in R - 5: Intro to Monte Carlo Simulation
We start with a simple example using R, focusing on a single security. Although perhaps seemingly trivial, this lays foundation for more complexities such as multiple correlated securities and stochastic interest rates...
Jobs
Sr Data Scientist, Marketing Algorithms/Analytics - Netflix, Los Gatos, CA Netflix is seeking an outgoing, curious, interdisciplinary data expert to work as a data miner, statistical modeler and algorithm designer. You'll have opportunity to work closely with marketing decision makers and other sr data scientists to better understand and optimize our different customer acquisition channels. You'll bring a combination of mathematical rigor and innovative algorithm design to create recipes that efficiently extract relevant insights from billions of rows of data to meaningfully improve our operations...
Training & Resources
Mathematicalmonk's Channel
Extensive set of videos teaching Machine Learning...
PyCon US 2014 – Videos (Tutorials) The full set of videos tutorials are now all online...
Prediction.io Open Source Machine Learning Server
Prediction.io is an open source machine learning server for predictive solutions, such as personalization or recommendations, built on top of scalable frameworks such as Hadoop and Cascading - to handle Big Data...
Outliers Many machine learning and data analysis tutorials often contain some version of the following phrase as one of the preliminary steps to building a model: “Identify outliers in your data and remove them.” Sounds simple, right? Unfortunately, almost none of these tutorials spend any time talking about what an outlier actually is and what the consequences of removing data that fairly or unfairly gets labeled as an outlier does to your model...
Books
Who's #1?: The Science of Rating and Ranking An interesting and approachable look at the world of ranking algorithms...
"Who's #1? is an excellent survey of the fundamental ideas behind mathematical rating systems. Once a realm of sports enthusiasts, ranking things is becoming a vital tool in many information-age applications. Langville and Meyer compare and contrast a variety of models, explaining the mathematical foundations and motivation. Readers of this book will be inspired to further explore this exciting field."
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Did you enjoy the newsletter? Do you have friends/colleagues who might like it too? If so, please forward it along - we would love to have them onboard :)