Data Science Weekly - Issue 95
Issue #95 September 17 2015
Editor Picks
Intelligent machines: Making AI work in the real world
As part of the BBC's Intelligent Machines season, Google's Eric Schmidt has penned an exclusive article on how he sees artificial intelligence developing, why it is experiencing such a renaissance and where it will go next...
DeepHear - Composing and harmonizing music with neural networks
I trained a network to generate random bars of music, based on Scott Joplin's ragtime music. It is a fully connected Deep Belief Network, set up to perform an auto-encoding task. The results sound something like this...
Why is a Raven Like a Writing Desk?
Alice in Wonderland restyled by a content/style ConvNet, using Justin Johnson's Torch implementation...
A Message from this week's Sponsor:
Machine Learning and Big Data: Business Challenges
October 8, 2015 - Berlin
Join data scientists from Badoo, Deutsche Telekom, GfK, TripAdvisor, AIG and other leading innovators who will gather to discuss the challenges they face in machine learning and the future of and big data at a business session organized by Yandex Data Factory as part of the international scientific conference 'Machine Learning: Prospects and Applications'. Fully interactive and hands-on, this business session eschews the trendy topics and buzzwords, aiming instead at exploring the essence of the changes that new technologies bring to the real world and the lives of both the ordinary people and big enterprises. Register here.
Data Science Articles & Videos
Sharks, Landsharks, Geoplotting, and KDTrees!
So now with the end of summer officially here (at least in the northern hemisphere), we thought it would be interesting to dig into some shark attack data. In this post, we'll look through the Global Shark Attack File, checkout some of the characteristics of shark attacks and then dive in to some geo-plotting with Matplotlib Basemap...
3 minute summary of The Empire Strikes Back
Applying automatic text summarization to film...
Giraffe: Using Deep Reinforcement Learning to Play Chess
This report presents Giraffe, a chess engine that uses self-play to discover all its domain-specific knowledge, with minimal hand-crafted knowledge given by the programmer...
Letting Users Choose Recommender Algorithms: An Experimental Study
As one way of taking advantage of the relative merits of different algorithms, we gave users the ability to change the algorithm providing their movie recommendations and studied how they make use of this power... We examine log data from user interactions with this new feature to understand whether and how users switch among recommender algorithms, and select a final algorithm to use...
Machine Learning Trick of the Day: Hutchinson's Trick
Hutchinson's estimator is a simple way to obtain a stochastic estimate of the trace of a matrix. This is a simple trick that uses randomisation to transform the algebraic problem of computing the trace into the statistical problem of computing an expectation of a quadratic function...
Time Maps: Visualizing Discrete Events Across Many Timescales
In this blog post, I’ll describe a technique for visualizing many events across multiple timescales in a single image, where little or no zooming is required...
Fujitsu Technology that uses machine learning to quickly generate predictive models from massive datasets
Fujitsu Laboratories today announced the development of a machine-learning technology that can generate highly accurate predictive models from datasets of more than 50 million records in a matter of hours...
Analyzing 1.7 Billion Reddit Comments with Blaze and Impala
In this post, we'll use Blaze and Impala to interactively query and explore a data set of approximately 1.7 billion comments (975 GB uncompressed) from the reddit website from October 2007 to May 2015...
Forget Dark Energy: MIT Physicists Have Finally Cracked Overhand Knots
In a study recently accepted for publication in the Physical Review Letters, engineers at MIT and Pierre et Marie Curie University in Paris offer a new fundamental theory of knots based on relationships between topology, the mathematics of spatial relationships, and the basic mechanics of friction and pliability...
Jobs
Data Scientist - Trunk Club - Chicago At Trunk Club, we develop services that help our employees make data-driven decisions, from stylists to merchandising. Every team at Trunk Club relies on the data we collect about our customers and how we interact with them to make meaningful decisions. Our tool chain strives to make it easy and painless to turn an algorithm into an API or run A/B tests using multivariate models. Our work drives Trunk Club - we enable the entire company to iterate faster and help accelerate business growth. You will have visibility into every team and be connected directly to executives without bureaucracy...
Training & Resources
Bayesian Data Analysis Python Demos
IPython notebooks for Bayesian Data Analysis - great teaching materials...
Cheatsheet – Python & R codes for common Machine Learning Algorithms
Here’s a collection of 10 most commonly used machine learning algorithms with their codes in Python and R. Considering the rising usage of machine learning in building models, this cheat sheet is good to act as a code guide to help you bring these machine learning algorithms to use...
Evaluating Machine Learning Models (free Ebook)
A Beginner's Guide to Key Concepts and Pitfalls...
Books
The Signal and the Noise: Why So Many Predictions Fail — but Some Don't Very well reviewed...
"This is the best general-readership book on applied statistics that I've read. Short review: if you're interested in science, economics, or prediction: read it. It's full of interesting cases, builds intuition, and is a readable example of Bayesian thinking."
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian