Data Science Weekly - Issue 68
Issue #68 March 12 2015
Editor Picks
Where It All Started: How I Became a Data Scientist - (1) Follow the Data
I thought I’d ease into this more technical subject by answering a question that I get asked many times: “how did you end up as a social media data scientist from your biophysics PhD background?”...
How PayPal uses deep learning and detective work to fight fraud
Today, said Wang, PayPal’s senior director of global risk sciences, “The fraudsters we’re interacting with are… very unique and very innovative. …Our fraud problem is a lot more complex than anyone can think of.” In deep learning, though, Wang and her team might have found a way to help level the playing field between PayPal and criminals who want exploit the online payment platform...
On the Case at Mount Sinai, It’s Dr. Data
Jeffrey Hammerbacher is a number cruncher — a Harvard math major who went from a job as a Wall Street quant to a key role at Facebook to a founder of a successful data start-up. But five years ago, he was given a diagnosis of bipolar disorder, a crisis that fueled in him a fierce curiosity in medicine — about how the body and brain work and why they sometimes fail. The more he read and talked to experts, the more he became convinced that medicine needed people like him: skilled practitioners of data science who could guide scientific discovery and decision-making...
A Message from this week's Sponsor
Want to be a Data Scientist, but don't know where to start?
Learn essential Data Science skills in SlideRule's Intro to Data Science Workshop. In this online bootcamp, you'll learn R, data wrangling, analytics and visualization by working on real projects, with 1-on-1 mentorship from expert Data Scientists from LinkedIn, Glassdoor, Trulia and Stripe.
Spots are limited; registration ends in 48 hours!
Data Science Articles & Videos
Invasion of algorithms: Modern-day equations which can rule our lives
These are equations which, by processing huge amounts of micro-data, can predict our behaviour - but are they for better or worse?...
If an Algorithm Wrote This, How Would You Even Know?
Let me hazard a guess that you think a real person has written what you’re reading. Maybe you’re right. Maybe not. Perhaps you should ask me to confirm it the way your computer does when it demands that you type those letters and numbers crammed like abstract art into that annoying little box. Because, these days, a shocking amount of what we’re reading is created not by humans, but by computer algorithms...
Terence Tao: the Mozart of maths
While the clichéd maths genius is a socially awkward recluse, Adelaide-born Terry Tao is refreshingly normal - and no less a prodigy for that. Stephanie Wood meets him...
Learning a Convolutional Neural Network for Non-uniform Motion
Blur Removal Machine Learning Algorithms
In this paper, we address the problem of estimating and removing non-uniform motion blur from a single blurry image. We propose a deep learning approach to predicting the probabilistic distribution of motion blur at the patch level using a convolutional neural network (CNN)...
Can Spark Streaming survive Chaos Monkey?
Netflix is a data-driven organization that places emphasis on the quality of data collected and processed. With Spark Streaming as our choice of stream processor, we set out to evaluate and share the resiliency story for Spark Streaming in the AWS cloud environment. A Chaos Monkey based approach, which randomly terminated instances or processes, was employed to simulate failures...
1075 Artworks Ordered by Similarity
This one is quite interesting. At first it looks like there is not much order going on, but on closer inspection the algorithm actually manages to group paintings by the same artist close together. And it does that even for paintings that are done in different techniques. ...
A Word is Worth a Thousand Vectors
Standard natural language processing (NLP) is a messy and difficult affair. It requires teaching a computer about English-specific word ambiguities as well as the hierarchical, sparse nature of words in sentences. At Stitch Fix, word vectors help computers learn from the raw text in customer notes...
Trust The Algorithms, Not The Data
For the purpose of this article though I’d like to focus on the (I think) much more intriguing case of “random” error...
Do We Need More Training Data?
Datasets for training object recognition systems are steadily increasing in size. This paper investigates the question of whether existing detectors will continue to improve as data grows, or saturate in performance due to limited model complexity and the Bayes risk associated with the feature spaces in which they operate...
Jobs
Data Scientist - Localytics - Boston MA Build the future of mobile with Localytics. Named among the top places to work by The Boston Globe, we're empowering app publishers through predictive analytics and machine learning...
Training & Resources
A Full Hardware Guide to Deep Learning
One of the worst things you can do when building a deep learning system is to waste money on hardware that is unnecessary. Here I will guide you step by step through the hardware you will need for a cheap high performance system...
Journey from a Python noob to a Kaggler on Python
The aim of this page is to provide a comprehensive learning path to people new to python for data analysis....
How-to: Tune Your Apache Spark Jobs (Part 1)
Learn techniques for tuning your Apache Spark jobs for optimal efficiency...
Books
Effective Python: 59 Specific Ways to Write Better Python NEW RELEASE: Practical advice for each major area of development with Python...
"Effective Python is a time-efficient way to learn – or remind yourself – what the best practices are and why we use them. It’s a concise book of practical techniques to write maintainable, performant and robust code using practices widely accepted in the community..."
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Enjoyed the newsletter? Please forward it along to friends and colleagues - we'd love to have them onboard! - All the best, Hannah & Sebastian