Data Science Weekly - Issue 94
Issue #94 September 10 2015
Editor Picks
3 Million Judgements of Books by their Covers
Last week, my friend Nate Gagnon and I launched Judgey, a browser-based game that gave users the opportunity to literally judge books by their covers. We’re both makers, and Nate is a writer, and I’m technical. So excuse me if I get technical — I promise to reward you with pretty graphs...
Building Data Products
I've spent much of the last decade building various products or tools from healthcare billing to finance and now advertising all focused on extracting the valuable information contained in some dataset(s). This is meant to be a quick summary of some lessons learned, hopefully some readers find it useful...
Implementing a Neural Network from Scratch – An Introduction
In this post we will implement a simple 3-layer neural network from scratch. We won’t derive all the math that’s required, but I will try to give an intuitive explanation of what we are doing. I will also point to resources for you read up on the details...
A Message from this week's Sponsor: Continuum Analytics
Spending too much time managing R & Python packages? Not with Anaconda.
Download Anaconda - the modern open source enterprise-ready analytics platform - for FREE, and spend more time on data science.
Data Science Articles & Videos
What's the Difference Between Data Science and Statistics?
These days, data science is hot. The job of "data scientist" was referred to by the Harvard Business Review as the "Sexiest Job of the 21st Century." Why did data science come to exist? And just what is it that distinguishes data science from statistics?...
Neural Abstraction Pyramid to Semantic RGB-D Perception
At RE.WORK Deep Learning Summit in London, Sven will discuss the Neural Abstraction Pyramid, a deep learning architecture, in which layer-by-layer unsupervised learning creates increasingly abstract image representations. The presentation will also focus on more recent work on deep learning for object-class segmentation of images and semantic RGB-D perception. We caught up with Sven ahead of the summit to hear more about his work and the recent advancements in deep learning...
Inferring Algorithmic Patterns with Stack
With the recent success of deep learning, one can ask how far are we from developing a real artificial intelligence as shown in science fiction movies. In our recent work, we show that certain simple sequential patterns cannot be learned by these popular deep learning approaches as they rely mostly on remembering previously seen patterns frequently appearing in the training data. We propose a novel sequence prediction approach which has the capability to learn these simple concepts...
The Fallacy of Placing Confidence in Confidence Intervals
Welcome to the web site for the (upcoming) paper "The Fallacy of Placing Confidence in Confience Intervals." Here you will find a number of resources connected to the paper, including the paper itself, the supplement, teaching resources etc...
Machine Learning Trick of the Day: Gaussian Integral Trick
Today's trick, the Gaussian integral trick, is one that allows us to re-express a (potentially troublesome) function in an alternative form, in particular, as an integral of a Gaussian against another function...
All About Hadoop
President of Scale Unlimited, Ken talks about data mining, the industry, and valuable skills to learn if you're looking to enter the field...
Semi-Supervised Factored Logistic Regression for High-Dimensional Neuroimaging Data
Imaging neuroscience links human behavior to aspects of brain biology in ever-increasing datasets. Existing neuroimaging methods typically perform either discovery of unknown neural structure or testing of neural structure associated with mental tasks. However, testing hypotheses on the neural correlates underlying larger sets of mental tasks necessitates adequate representations for the observations. We therefore propose to blend representation modelling and task classification into a unified statistical learning problem...
Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation
Monaural source separation is important for many real world applications. It is challenging because, with only a single channel of information available, without any constraints, an infinite number of solutions are possible. In this paper, we explore joint optimization of masking functions and deep recurrent neural networks for monaural source separation tasks, including monaural speech separation, monaural singing voice separation, and speech denoising...
Parameter Estimates for Regression Least Squares, Gradient Descent & Monte Carlo Methods
In this post I will cover three ways to estimate parameters for regression models; least squares, gradient descent and Monte Carlo methods. The aim is to introduce important methods widely used in machine learning, such as gradient descent and Monte Carlo, by linking them to a use case which is widely used and understood by the different "data science" communities, such as linear regression...
Jobs
Data Scientist-Vine - Twitter - NYC Vine is seeking a data scientist to help understand how people use Vine. We’re looking for folks with a passion for consumer internet products to help drive informed decisions for the team. You’ll work with enormous datasets, a top notch team, and cutting edge technology. Best of all, you’ll see your insights and findings turned into real products on a regular basis...
Training & Resources
The Civis API: Scale Up Your Data Science
In the final stretch of a major client project back in 2014, a fellow data scientist at Civis Analytics whispered a modest proposal to me: “We should be able to build all these models from massive amounts of survey data – and actually get some sleep.” A reasonable idea – and one that we ended up solving with the Civis API...
The Only Probability Cheatsheet You'll Ever Need
This is an 10-page probability cheatsheet compiled from Harvard's Introduction to Probability course, taught by Joe Blitzstein. The cheatsheet summarizes important probability probability concepts, formulas, and distributions, with figures, examples, and stories...
Linear Algebra Lectures
These video lectures of Professor Gilbert Strang teaching 18.06 were recorded live in the Fall of 1999; and are free. Recommended by several readers!...
Books
Naked Statistics: Stripping the Dread from the Data Interesting take on the importance of statistics...
"While a great measure of the book’s appeal comes from Mr. Wheelan’s fluent style—a natural comedian, he is truly the Dave Barry of the coin toss set—the rest comes from his multiple real world examples illustrating exactly why even the most reluctant mathophobe is well advised to achieve a personal understanding of the statistical underpinnings of life" - New York Times
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian