Data Science Weekly - Issue 130
Issue #130 May 19 2016
Editor Picks
A guy just transcribed 30 years of for-rent ads.
Here’s what it taught us about housing prices.
I don’t know anything about Eric Fischer except that he’s a freaking hero. Much like everyone else who has recently attempted to live in San Francisco, Fischer is very interested in housing costs. However, unlike every other such person, Fischer decided to contribute to this conversation by doubling the depth of modern historical data about them...
The Rise of the Data Natives
We are now witnessing a new revolution — that of data natives who expect their world to be “smart” and seamlessly adapt to their taste and habits...
Statistical Power Analysis
Imagine a scientist planning to run an experiment. A power analysis can help answer questions like: a) Will this experiment work - how likely is it to detect a statistically significant effect?, b) How much data needs to be collected?, and c) What is the smallest effect this experiment can measure?...This visualization illustrates how assumptions about the data generating process affect the likelihood of detecting a significant effect...
A Message from this week's Sponsor:
Catenus Science Apprenticeship Program
The Catenus Science Apprenticeship Program identifies top data scientists who will raise the bar when hired at a startup. To help meet this goal, the program will train qualified candidates to have immediate, meaningful impact as data scientists in some of the top data startups in the world. This program will hone their skills in statistics, machine-learning, programming, and product development by presenting them with real-world challenges put forth by startups in Silicon Valley and the Bay Area.
We offer a fully-paid, 13-week apprenticeship during which we reinforce technical and business skills. We do this via a mix of formal instruction and hands-on application of data science in some of the best startups in the world...
Data Science Articles & Videos
Check Yo Self: 5 Things You Should Know About Data Science (Author Note)
Hey folks. I’m going to take a brief break from the narrative to talk to you directly about data science as a discipline. There’s a lot of noise floating around about how data scientists are the sexy saviors of the world. Well, we’re not...
Liberating Data from NYC Property Tax Bills
So there you have it… we turned 1.1 million pdfs into a high quality open dataset on NYC property taxes, including all exemptions and abatements. Data scientists everywhere, go forth and crunch the numbers...
Shot Blocking in the NHL Playoffs
As a casual observer, it seemed to me like shot blocking was more prevalent during playoff games than the regular season. Intuitively, this would make sense since there is more on the line for each game, but I wanted to take a look at some data to see whether my suspicions were correct...
Wikipedia Navigation Vectors
Wikipedia Navigation Vectors: a semantic embedding of Wikipedia learned from 370M sessions. In this project, we learned embeddings for Wikipedia articles and Wikidata items by applying Word2vec models to a corpus of reading sessions...
The NYPD Was Systematically Ticketing Legally Parked Cars for Millions of Dollars a Year- Open Data Just Put an End to It
New York City is a complex place to drive. And when it comes to parking, there are plenty of rules and regulations to follow. It’s no wonder that sometimes people get confused and end up getting their cars ticketed or towed. But in all of these rules, there is one thing that very few drivers seem to know...
Feed-forward neural doodle
Sometimes you sigh you cannot draw, aren’t you? It takes time to master the skills, and you have more important things to do :) What if you could only sketch the picture like a 3-years old and everything else is done by a computer so your sketch looks like a real painting? We make a step towards making such things available for everybody and present an online demo of our fast algorithm...
How to create your own Machine Learning Predictive System in the NBA using Python
Which sports geek wouldn’t like to create their own system for predicting matches, be it if you want to bet or just from an intellectual curiosity? This is not going to be a comprehensive DIY kind of guide, I’m just going to talk about what I found when playing with this stuff for a few months and share some code that will be very useful for anyone that wants to get started with this...
Visualising Random Variables
When teaching mathematics, the traditional method of lecturing in front of a blackboard is still hard to improve upon, despite all the advances in modern technology. However, there are some nice things one can do in an electronic medium, such as this blog. Here, I would like to experiment with the ability to animate images, which I think can convey some mathematical concepts in ways that cannot be easily replicated by traditional static text and images...
Jobs
Data Scientist - Jawbone - San Francisco Our mission is to deliver better living through data -- from wearable tech to best-in-class Bluetooth headsets and wireless speakers. We’re looking for data scientists who share our passion to transform massive amounts of data into products that delight our customers and improve their lives...
Training & Resources
Easier data analysis in Python with pandas (video series)
pandas is a powerful, open source Python library for data analysis, manipulation, and visualization. If you're working with data in Python and you're not using pandas, you're probably working too hard! In this video series, we'll focus on the functionality that is most important to master, as well as making use of the latest and greatest pandas features. You'll learn current best practices, and you can follow along with every video at home because all datasets used in the series are available online...
Modern Pandas (Part 7): Time Series
This is part 7 in my series on writing modern idiomatic pandas...
Impatient R
Impatient R: a well done guide to beginning to learn the R programming language for impatient, clever people...
Books
The Lady Tasting Tea:
How Statistics Revolutionized Science in the Twentieth Century An insightful, revealing history of how mathematics transformed our world...
"I have taken courses in statistics, taught it many times and solved several statistical problems that have appeared in journals. But until I read this book, I never really thought about it in so deep and philosophical a manner..."
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian