Data Science Weekly - Issue 52
Issue #52 Nov 20 2014
Special Message: It's been a year! (This is Issue #52!)
Many thanks to everyone who's subscribed, enjoyed, sent messages etc. Looking forward to the next 52 issues! ...
Editor Picks
Personalized Recommendations at Etsy
Providing personalized recommendations is important to our online marketplace. It benefits both buyers and sellers: buyers are shown interesting products that they might not have found on their own, and products get more exposure beyond the seller’s own marketing efforts. In this post we review some of the methods we use for making recommendations at Etsy...
Machine-Learning Algorithm Ranks the World's Most Notable Authors
Deciding which books to digitise when they enter the public domain is tricky; unless you have an independent ranking of the most notable authors...
Deep Visual-Semantic Alignments for Generating Image Descriptions
We present a model that generates free-form natural language descriptions of image regions. Our model leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between text and visual data. Our approach is based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding...
Data Science Articles & Videos
Flock: Hybrid Crowd-Machine Learning Classifiers
We present hybrid crowd-machine learning classifiers: classification models that start with a written description of a learning goal, use the crowd to suggest predictive features and label data, and then weigh these features using machine learning to produce models that are accurate and use human-understandable features...
Show and Tell: A Neural Image Caption Generator
Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image...
Building a Recommendation Engine for Reddit. Part 1
This is the breakdown of how I built Find a Sub, a Recommendation Engine for Reddit...
Fun with Machine Learning: Does your model actually work?
If you’re building a model to predict something, the first question anyone’s going to ask you is: “So, how well does it work?”...
Putting the Magic in Data Science
I argue that magic in data science often comes from combining various “tricks” in novel ways. I describe four common tricks we use at Facebook, as well as a grab bag of others that I’ve found useful...
Neural Turing Machines
Attempt at implementing system described in "Neural Turing Machines...
Scaling Language Understanding via Joint Multilingual Learning
In this talk, Andreea Bodnari (Chief Data Scientist at Movable Ink) presents two probabilistic models that systematically model both the depth and the breadth of natural languages for two different linguistic tasks: syntactic parsing and joint learning of named entity recognition and coreference resolution...
Factorbird - a Parameter Server Approach to Distributed Matrix Factorization
We present Factorbird, a prototype of a parameter server approach for factorizing large matrices with Stochastic Gradient Descent-based algorithms...
Artificial Intelligence is a tool, not a threat
Recently there has been a spate of articles in the mainstream press, and a spate of high profile people who are in tech but not AI, speculating about the dangers of malevolent AI being developed, and how we should be worried about that possibility. I say relax. Chill...
Jobs
Senior Data Scientist - Walmart Labs - San Bruno, CA Work on building state of the art data systems that ingest, model and analyze massive flow of data from online, social, mobile and offline commerce/user activity to create models to achieve business objectives. Use cutting edge machine learning, data mining and optimization algorithms underneath it all to analyze all this data on top of Hadoop/HBase/Hive/Pig. Have the aptitude to see through and analyze the complexity of intricate statistical and learning algorithms; and design roadmaps for efficient and scalable implementation while using numerical routines tailored to the problem...
Training & Resources
#WhiteboardWalkthrough - Deep Learning
Sungwook Yoon, a Data Scientist at MapR, walks you through Deep Learning versus Neural Networks, two modeling methods in Machine Learning...
Synaptic: Architecture-free neural network library for node.js and browser
Synaptic is a javascript neural network library for node.js and the browser, its generalized algorithm is architecture-free, so you can build and train basically any type of first order or even second order neural network architectures...
Seaborn v0.5.0 release
This is a major release from 0.4. Highlights include new functions for plotting heatmaps, possibly while applying clustering algorithms to discover structured relationships. These functions are complemented by new custom colormap functions and a full set of IPython widgets that allow interactive selection of colormap parameters. The palette tutorial has been rewritten to cover these new tools and more generally provide guidance on how to use color in visualizations. There are also a number of smaller changes and bugfixes...
Books
The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution Following his biography of Steve Jobs, The Innovators is Walter Isaacson’s revealing story of the people who created the computer and the Internet...
"A sweeping history of the digital revolution, and the curious partnerships and pulsing rivalries that inhabit it..."
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Enjoyed the newsletter? Please forward it to friends and peers - we'd love to have them onboard too :-) - All the best, Hannah & Sebastian