Data Science Weekly - Issue 49
Issue #49 Oct 30 2014
Editor Picks
On Starting a New Job
This blog post is a mixture of how to get into data science as well as how to leave academia for industry. I want to be clear that this is not my farewell letter to academia, but rather advice to other PhDs—especially in the social sciences—who are considering going into industrial data science.
Using Machine Learning to Expose Haters
I wanted to use data science / machine learning to identify and rank haters by their “hater” level throughout the internet. I started with Hacker News and I wanted to explain the how and what I’ve done so far...
The Data-Driven vs. Gut Feel Hyperbole Needs to Stop
Smart decision-making is more complicated than becoming ‘data-driven’, whatever that means exactly...
Data Science Articles & Videos
Statistical Machine Learning Methods to Auto-Map Landforms from Digital Data
Dr Ekkamol Vannametee of Utrecht University has been using multiple point geostatistics (MPS) to make significant breakthroughs in landform mapping automation. The MPS method uses statistical machine learning techniques to automatically interpret landforms in a way that mimics human reasoning...
Bayes' Rule in an Animated GIF
Many find [Bayes Rule] counterintuitive; this animated gif is meant to help.
The Dangers of Faith in Data
"Faith in a source of data grows in direct relation to your distance from the collection of it”...
Will Deep Learning Make Other Machine Learning Algorithms Obsolete? [Quora]
Every once in a while a new algorithms comes and makes all others (in the same domain) seems kind of obsolete when it comes to the same domain. Will deep learning make that related algorithms (backpropagation NN, GMM, HMM, ...)?
The Three Breakthroughs That Have Finally Unleashed AI on the World
A picture of our AI future is coming into view...The AI on the horizon looks more like Amazon Web Services—cheap, reliable, industrial-grade digital smartness running behind everything, and almost invisible except when it blinks off...
List of 75 Most Popular Deep Learning Papers in the Bibliography
We’re showing the 75 most popular Deep Learning papers in the list below. Note that this blog post (and the other blog posts) and the deep learning bibliography itself is available on github: github.com/memkite/DeepLearningBibliography ...
Foundations of Data Science [PDF]
These notes are a first draft of a book being written by Hopcroft and Kannan (Microsoft Research) and in many places are incomplete. However, the notes are in good enough shape to prepare lectures for a modern theoretical course in computer science.
Analyzing Text in Rapidminer - Part 2: Rotten Tomatoes Movie Reviews
In this blog, we’re going to build a slightly more sophisticated process than the last one, which we can use to scrape movie reviews from Rotten Tomatoes and analyze them in RapidMiner.
The Pothole Problem
Surely there is a more efficient way than having two chaps driving the motorways of the UK looking for holes in the ground? Couldn't the wonders of science, big data, machine learning, and perhaps the greatest thing on the Gartner hype cycle: the Internet of Things (IoT), come together in beautiful scientific and software symmetry to solve this glaring inefficiency?
Voices of Pro Bono Data Science: The Key
This is the first post in a four-part blog series highlighting voices of pro bono data science where we asked our volunteers and partner organizations to answer the question: "Why do you think pro bono data science can change the world?"
Jobs
Moore-Sloan Data Science Fellows The NYU Center for Data Science invites applications for positions as Moore-Sloan Data Science Fellows. These positions are a prominent feature of the Moore-Sloan Data Science Environment at NYU, a multi-institutional effort funded in part by a generous grant from the Moore and Sloan Foundations...Appointments will be initially for two years, with an expectation of renewal for a third on satisfactory performance. Fellowships will be offered competitive salary and benefits, with funds to support research and travel. There is some flexibility about start date, but September 1, 2015 is expected.
Training & Resources
The Design and Implementation of Probabilistic Programming Languages
This (free online) book explains how to implement Probabilistic programming languages (PPLs) by lightweight embedding into a host language. We illustrate this by designing and implementing WebPPL, a small PPL embedded in Javascript...
The Evolution of Boosting Algorithms - From Machine Learning To Statistical Modeling
The basic idea [of boosting] is to boost the accuracy of a weak classifying tool by combining various instances into a more accurate prediction...We highlight the methodological background and present the most common software implementations. Worked out examples and corresponding R code can be found in the Appendix.
On Being a Data Skeptic - Cathy O’Neil
In this paper, I’ll make the case that the community of data practitioners needs more skepticism, or at least would benefit greatly from it, for the following reason: there’s a two-fold problem in this com‐ munity. On the one hand, many of the people in it are overly enamored with data or data science tools. On the other hand, other people are overly pessimistic about those same tools
Books
A Million First Dates: Solving the Puzzle of Online Dating Thanks to the increasingly efficient algorithms that power these sites, dating has been transformed from a daunting transaction based on scarcity to one in which the possibilities are almost endless...
As journalist Dan Slater shows, online dating is changing society in more profound ways than we imagine. He explores how these new technologies, by altering our perception of what’s possible, are reconditioning our feelings about commitment and challenging the traditional paradigm of adult life...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Enjoyed the newsletter? Please forward it to friends and peers - we'd love to have them onboard too :-) - All the best, Hannah & Sebastian