Data Science Weekly - Issue 142
Issue #142 Aug 11 2016
Editor Picks
Analysis of Trump's tweets confirms he writes only the (angrier) Android half
I don’t normally post about politics (I’m not particularly savvy about polling, which is where data science has had the largest impact on politics). But this weekend I saw a hypothesis about Donald Trump’s twitter account that simply begged to be investigated with data...
The Rise of Artificial Unintelligence
Computers may one day be able to reason exactly as humans do, but will they ever be as dumb? I had always thought that was impossible. Now, however, I’m not so sure...
Crystal Ball for Corn Crop Yields Will Revolutionize Commodity Trading
TellusLabs is using NASA imagery, machine learning, and expert knowledge about vegetation to deliver accurate, in-season agricultural yield estimates...
A Message from this week's Sponsor:
Want to use your Python skills to break into Data Science?
In Springboard's Data Science Intensive Workshop - learn online with a personal mentor, build real-world data science projects and start participating in Kaggle competitions. Perfect for those with statistics and programming backgrounds. Spots for the next class (Aug 29th) are filling fast. Enroll now!
Data Science Articles & Videos
AI’s Language Problem
Machines that truly understand language would be incredibly useful. But we don’t know how to build them...
Why Ball Tracking Works for Tennis and Cricket but Not Soccer or Basketball
Following the examples of tennis and cricket, a new generation of ball-tracking algorithms is attempting to revolutionize the analysis and refereeing of soccer, volleyball, and basketball...
Automatic tagging using deep convolutional neural networks
We present a content-based automatic music tagging algorithm using fully convolutional neural networks (FCNs). The experiments show that mel-spectrogram is an effective time-frequency representation for automatic tagging and that more complex models benefit from more training data...
Goods: organizing Google’s datasets
You can (try and) build a data cathedral. Or you can build a data bazaar. By data cathedral I’m referring to a centralised Enterprise Data Management solution that everyone in the company buys into and pays homage to, making a pilgrimage to the EDM every time they want to publish or retrieve a dataset. A data bazaar on the other hand abandons premeditated centralised control...
Building a Data Pipeline with Airflow
In this blog post I'll setup a data pipeline that takes currency exchange rates, stores them in PostgreSQL and then caches the latest exchange rates in Redis...
Modeling Madly: Machine learning at hackathons
These past two hackathons I’ve taken on some slightly different challenges than people usually go after in a hackathon: developing new machine learning models. While I‘ve been working on data science and machine learning systems for a while, I’ve found that trying to do so under extreme constraints can be a distinctly different experience...
Image Completion with Deep Learning in TensorFlow
This paper shows how to use deep learning for image completion with a DCGAN. This blog post is meant for a general technical audience with some deeper portions for people with a machine learning background...
Design Better Data Tables
Poor tables. Where did they go wrong? After being the bread and butter of the web for most of its early history, tables were cast aside by many designers for newer, trendier layouts. But while they might be making fewer appearances on the web these days, data tables still collect and organize much of the information we interact with on a day-to-day basis...
Jobs
Data Scientist - StreetEasy (Zillow) - New York, NY The StreetEasy Economic Research team is looking for an outstanding Data Scientist to join us. You will be responsible for deriving fascinating insights on the New York housing market from terabytes of StreetEasy market and usage data. This role will require a candidate who can apply a breadth of tools, data sources and analytical techniques to answer a wide range of high level questions and present the insights in a concise and effective manner. You'll work in an informal, collaborative atmosphere with a team of smart self-starters like yourself...
Training & Resources
Grouping in Pandas
Grouping data is an integral part of many data analysis projects. The functionality for grouping in pandas is vast, but can be tough to grasp initially. Have no fear...we will get through a short introduction together using some data from NYC's beloved bike share program, Citi Bike...
A Beginner's Guide to Variational Methods: Mean-Field Approximation
This post is an introductory tutorial on Variational Methods. I will derive the optimization objective for the simplest of VB methods, known as the Mean-Field Approximation...
Forcats: Experimental package that helps you work with factors
Tools for working with categorical variables (factors)...
Books
Data Visualization with Python and JavaScript: Scrape, Clean, Explore & Transform Your Data Learn how to turn raw data into rich, interactive web visualizations with the powerful combination of Python and JavaScript. With this hands-on guide, author Kyran Dale teaches you how build a basic dataviz toolchain with best-of-breed Python and JavaScript libraries—including Scrapy, Matplotlib, Pandas, Flask, and D3—for crafting engaging, browser-based visualizations...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian