Data Science Weekly - Issue 85
Issue #85 July 9 2015
Editor Picks
What PhDs do wrong (and right!) when applying for Data Science jobs
I've been doing lots of interviewing folks transitioning from academia into industry data science jobs and collected some reflections on what we've learned and how we advise candidates to approach the problem...
Two Chatbots Talking To Each Other Is Absolutely Horrifying
Ultra Hal is an artificially intelligent chat bot that learns from past conversations, according to its creators at Zabaware. Having two of them talk may result in: mild sexism, communism, and random interjections of trivia...
What My Deep Model Doesn't Know...
Myself? I recently spent some time trying to understand why these deep learning models work so well – trying to relate them to new research from the last couple of years. I was quite surprised to see how close these were to my beloved Gaussian processes. I was even more surprised to see that we can get uncertainty information from these deep learning models for free – without changing a thing....
Data Science Articles & Videos
Ryd.io - Every block in New York City has a story to tell
I recently worked with the NYC Taxi data set to make a new map of Manhattan that shows different areas of drop off activity. Translation: When are rides coming in during the week and weekend? What does a map of NY look like if we were to cluster these locations together?...
The Model Complexity Myth
(or, Yes You Can Fit Models With More Parameters Than Data Points)...
Analysing galaxy images with artificial intelligence:
Astronomers teach a machine how to ‘see’
A team of astronomers and computer scientists at the University of Hertfordshire have taught a machine to 'see' astronomical images. The technique...allows galaxies to be automatically classified at high speed, something previously done by thousands of human volunteers in projects like Galaxy Zoo...
Twitter’s New AI Recognizes Porn So You Don’t Have To
Clément Farabet, a research scientist at New York University, built brain-like computing systems that identified objects in photos and videos...When Farabet and his startup (MadBits) joined Twitter last summer, Roetter—the company’s head of engineering—told them to build a system that could automatically identify NSFW images on its popular social network...
MetaBoot: A Machine Learning Framework Of Taxonomical Biomarker Discovery For Different Microbial Communities Based On Metagenomic Data
As more than 90% of species in a microbial community could not be isolated and cultivated, the metagenomic methods have become one of the most important methods to analyze microbial community as a whole...MetaBoot, combines the techniques of mRMR (minimal redundancy maximal relevance) and bootstrapping, for discover of non-redundant biomarkers for microbial communities through mining of metagenomic data...
Big Data For A Big Problem: Putting Data To Work To Tackle Obesity
In recent decades, three things have grown at an astonishing rate: a) The obesity rate in our population, b) Health care costs, and c) The availability of health-related public-use data sets. The first two have always been close associates. It’s time we find innovative ways to bring the third along... If you are a data scientist, policy wonk, innovator, or hacker-at-large looking for a critical health problem to address, look no further...
Data Science: How Do We Get Started? – Part Three
Predicting the Outcome of March Madness
n the first part of this series, I discussed how important it is as a data scientist to ask the right questions when solving a new problem. In this post I will be digging into a specific example relating to the situation when a data scientist is presented with a prediction goal, but no data to support that goal. Specifically, we will be looking at trying to predict the outcome of the NCAA Men’s National Basketball Tournament known as March Madness...
The Supreme Court, Described as Machine Learning
The Supreme Court of the US is basically an ensemble of nine different highly trained neural net models that run a binary classification of whether a problem is “constitutional” or “not constitutional”. Each of these models was selected based on how well they had performed on an in-sample data set of previous, lower-level decisions...
How we scaled data science to all sides of Airbnb
over 5 years of hypergrowth
Five years ago, I joined Airbnb as its first data scientist...
Jobs
Data Scientist, Search and Discovery - Coursera - Mountain View, CA We’re looking for a talented data scientist to help us build an amazing content discovery experience for bringing learners to the right content. In this role, you’ll be directly involved in the design, implementation, and evaluation of discovery products ranging from on-platform catalog search to course and content recommendation. Our ideal candidate is an independent, analytically-minded individual with strong product engineering skills, specific domain experience in the development of search engines, and a passion for bringing education to the world!...
Training & Resources
SciPy 2015: Scientific Computing with Python Conference
Videos from the conference talks...
My favourite papers from day one of ICML 2015
Aargh! How can I possibly keep all the amazing things I learnt at ICML today in my head?! Clearly I can’t. This is a list of pointers to my favourite papers from today, and why I think they are cool. This is mainly for my benefit, but you might like them too!...
Toyplot - The Kid-Sized Plotting Toolkit For Python With Grownup-Sized Goals
Toyplot fully embraces principles and best practices for clarity and aesthetics in data graphics that are well-established by the visualization community, yet sadly lacking in contemporary plotting libraries. Toyplot has beautiful color palettes and sensible default styling that minimize chartjunk and maximize data ink out of the box, not as afterthoughts or addons...
Books
Automate the Boring Stuff with Python:
Practical Programming for Total Beginners Recent release recommended by a couple of our readers...
"Introductions to python are easy to find -- but at the end of the day most python tutorials for beginners end up being the same lessons repackaged, often leaving the new programmer with gaping holes in how their newly acquire skills can be applied practically. This is not one of those books..."
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian