Data Science Weekly - Issue 88
Issue #88 July 30 2015
Editor Picks
Hadley Wickham, the Man Who Revolutionized R
If you don’t spend much of your time coding in the open-source statistical programming language R, his name is likely not familiar to you -- but the statistician Hadley Wickham is, in his own words, “nerd famous.” The kind of famous where people at statistics conferences line up for selfies, ask him for autographs, and are generally in awe of him. “It’s utterly utterly bizarre,” he admits. “To be famous for writing R programs? It’s just crazy.”...
Using Algorithms to Determine Character
Computers aren’t just doing hard math problems and showing us cat videos. Increasingly, they judge our character. Maybe we should be grateful...
Hinton's Dropout in 3 Lines of Python
Dropout is a vital feature in almost every state-of-the-art neural network implementation. This tutorial teaches how to install Dropout into a neural network in only a few lines of Python code. Those who walk through this tutorial will finish with a working Dropout implementation and will be empowered with the intuitions to install it and tune it in any neural network they encounter...
A Message from this week's Sponsor
Online Learning, Only Better
Hack Reactor Remote Beta. Apply today to join an online classroom unlike any other. Attend our renowned immersive JavaScript coding school from wherever you are with Hack Reactor Remote Beta.
Data Science Articles & Videos
A Visual Introduction to Machine Learning
Using a data set about homes, we will create a machine learning model to distinguish homes in New York from homes in San Francisco...
What are Bloom filters?
“So how do you know whether someone’s read a post already?” asks Sarah, Medium’s legal counsel. We’re at dinner for my birthday, yet somehow have gotten onto the topic of work yet again. Sarah is referring to Medium’s personalised reading list — a recommendation system that I helped build — which suggested new and interesting posts to users when they visited Medium’s homepage...
Testing for Data Scientists
Trey Causey's slide deck from his talk at PyData 2015...
Why Deep Learning Works II: the Renormalization Group
Deep Learning is amazing. But why is it so successful? Is Deep Learning just old-school Neural Networks on modern hardware? Is it just that we have so much data now the methods work better? Is Deep Learning just a really good at finding features. Researchers are working hard to sort this out...
How Google Translate squeezes Deep Learning onto a Phone
Today we announced that the Google Translate app now does real-time visual translation of 20 more languages. So the next time you’re in Prague and can’t read a menu, we’ve got your back. But how are we able to recognize these new languages?...
Depth First Development in NYC
The Diabetic Retinopathy challenge on Kaggle has just finished. The goal of the competition was to predict the presence and severity of the disease Diabetic Retinopathy from photographs of eyes. In this post I’ll explain my learning process and progress as I implemented my first ConvNet over the last 3 months...
The Brain vs Deep Learning Part I: Computational Complexity —
Or Why the Singularity Is Nowhere Near
In this blog post I will delve into the brain and explain its basic information processing machinery and compare it to deep learning. I do this by moving step-by-step along with the brains electrochemical and biological information processing pipeline and relating it directly to the architecture of convolutional nets...
End-to-end Continuous Speech Recognition using
Attention-based Recurrent NN: First Results
We replace the Hidden Markov Model (HMM) which is traditionally used in in continuous speech recognition with a bi-directional recurrent neural network encoder coupled to a recurrent neural network decoder that directly emits a stream of phonemes...
Airbnb Needs To Be Better At Search Than Google
And for many holiday travelers these days, that means scouring Airbnb to find that perfect oceanfront cottage that sleeps eight, comes with a washer, dryer, Wi-Fi, and free parking on the premises. But what most people won’t realize when they nestle into their respective crash pads this July 4th is just how complex that search process really is...
Jobs
Software Engineer, Machine Intelligence - Yik Yak - Atlanta, GA Yik Yak, the hyperlocal anonymous social media platform, is looking for talented and creative software engineers with applied experience in machine learning (ML) and/or natural language processing (NLP) to join our new Machine Intelligence team, where we are building state-of-the-art systems that perform, at scale, a variety of machine intelligence tasks including bullying detection, topic modeling, trend discovery, and advanced visualizations, with the overarching goal of better understanding our millions of users and connecting them to the content they care most about. Are you excited at the thought of drinking from the Yak firehose?...
Training & Resources
Guide to Linear Regression
Linear regression is one of the first things you should try if you’re modeling a linear relationship (actually, non-linear relationships too!). It’s fairly simple, and probably the first thing to learn when tackling machine learning....
Python Data Bikeshed: "What PyData library should I use?"
Talk discusses many of the main Python libraries in some depth ...
Chapter 6: Deep Learning by Michael Nielsen
In this chapter, we'll develop techniques which can be used to train deep networks, and apply them in practice...
Books
Discovering Statistics Using R Recommended by several readers of the newsletter...
"The book is a great overview of statistics concepts and provides a gentle, yet comprehensive, introduction to the R language..."
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian