Data Science Weekly - Issue 75
Issue #75 April 30 2015
Editor Picks
An Algorithm Set To Revolutionize 3-D Protein Structure Discovery
A new way to determine 3-D structures from 2-D images is set to speed up protein structure discovery by a factor of 100,000...
A Statistical Analysis of the Work of Bob Ross
In total, Ross painted 381 works on the show, relying on a distinct set of elements, scenes and themes, and thereby providing thousands of data points. I decided to use that data to teach something myself: the important statistical concepts of conditional probability and clustering, as well as a lesson on the limitations of data...
Automatic Statistician & the Profoundly Desired Automation for Data Science
The Automatic Statistician project by Univ. of Cambridge and MIT is pushing ahead the frontiers of automation for the selection and evaluation of machine learning models. In general, what does automation mean to Data Science?...
Data Science Articles & Videos
The psychology of police sketches — and why they're usually wrong
Genetic algorithm being applied to generate more efficient police sketches...
"People Who Like This Also Like ... "
A while ago a friend of mine asked me how I would go about building a 'People Who Like This Also Like ...' feature for a music startup he was working at. For each band or musician, he wanted to display a list of other artists that people might also be interested in...
Becoming the Expert - Interactive Multi-Class Machine Teaching
In this work we propose an Interactive Machine Teaching algorithm that enables a computer to teach challenging visual concepts to a human...
Study: NFL Teams Have No Idea What They're Doing In The Draft
Massey was contracted by an unnamed NFL team to study the history of the draft for market inequalities. He discovered something that won't come as a surprise to football fans: the draft is kind of a crap shoot...
When is Cheryl's Birthday?
Cheryl's puzzle was designed to be solved with a pencil, the greatest problem-solving tool in the history of mathematics (although some prefer a pen, chalk, marker, or a stick for drawing in the sand). But I will show how to solve it with another tool: computer code...
Deep Learning Machine Solves the Cocktail Party Problem
Separating a singer’s voice from background music has always been a uniquely human ability. Not anymore...
Five Takeaways on the State of Natural Language Processing
Thoughts following the 2015 "Text By The Bay" Conference...
Amazon Machine Learning: use cases and a real example in Python
Here I would like to share my personal experience with this amazing technology, introduce some of the most important – and sometimes misleading – concepts of machine learning, and give this new AWS service a try with an open dataset in order to train and use a real-world AWS Machine Learning model...
Engineering the Hiring Process
At Karat, we are passionate about improving the effectiveness and efficiency of hiring. From time to time, we’ll post articles from our employees, advisors, and friends so they can share what they’ve learned from their personal hiring experiences. Daniel Tunkelang is an advisor to Karat. He’s worked at LinkedIn, Google, and Endeca in a variety of technical leadership roles, specializing in relevance engineering and data science...
Jobs
Research Scientist - Fitbit - San Francisco, CA Fitbit is the leader in the explosive market of health and fitness wearables. We empower and inspire our users to lead healthier and more active lifestyles with simple and delightful products. We are building a world-class research team of hacker-scientist-types to dream up, prototype, and deliver shipping products. Research at Fitbit spans a big set of problems from hardware development to embedded signal processing algorithms to data mining, all with a twist of experimentation. We work in a dynamic and collaborative environment where the goal is to learn things quickly, iterate fast, and make awesome products...
Training & Resources
Rodeo: A data science IDE for Python
Today we're excited to introduce a new project: Rodeo. Rodeo is an IDE that's built expressly for doing data science in Python. Think of it as a light weight alternative to the IPython Notebook...
Awesome-R: A curated list of the best add-ons for R
Qin Wenfeng has taken the trouble to curate the best add-ons to R in their list, awesome-R: A curated list of awesome R frameworks, packages and software. The list provides several (but not too many!) recommendations for R users in the areas of IDEs, data manipulation packages, database integration frameworks, machine learning suites, R-related websites, and much more...
Soren Macbeth - Data Science in Clojure
Yieldbot's data platform is built on top of clojure. I will cover some of our experience using clojure for data (sciencing and platform). I will cover our two major open source projects, marceline, a clojure dsl for apache storm/trident, and flambo, a clojure dsl for apache spark...
Books
Data Science from Scratch: First Principles with Python NEW RELEASE: Learn how many of the most fundamental data science tools and algorithms work by implementing them from scratch...
"Good, grounds-up, guide on how to get started..."
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Enjoyed the newsletter? Please forward it along to friends and colleagues - we'd love to have them onboard! - All the best, Hannah & Sebastian