Data Science Weekly - Issue 137
Issue #137 July 7 2016
Editor Picks
Data Mining Novels Reveals the Six Basic Emotional Arcs of Storytelling
Scientists at the Computational Story Laboratory have identified the six emotional arcs that form the building blocks of all stories...
The history of R's predecessor, S, from co-creator Rick Becker
Before there was R, there was S. R was modeled on a language developed at AT&T Bell Labs starting in 1976 by Rick Becker and John Chambers (and, later, Alan Wilks) along with Doug Dunn, Jean McRae, and Judy Schilling. At last week's useR! conference, Rick Becker gave a fascinating keynote address, Forty Years of S...
Brexit polling: What went wrong?
The difference between survey and election outcome can be broken down into five terms...
A Message from this week's Sponsor:
Where science and policy change the world. And You.
Apply your knowledge & skills to federal policy via the AAAS Science & Technology Policy Fellowships. A year-long professional development opportunity for doctoral level data scientists to serve in the federal government in Washington, D.C.
STPF fosters a career-enhancing network of science leaders who understand policymaking & contribute to society...
Data Science Articles & Videos
Google’s DeepMind AI to use 1 million NHS eye scans to spot diseases earlier
Google’s DeepMind division has announced a partnership with the NHS’s Moorfields Eye Hospital to apply machine learning to spot common eye diseases earlier. The five-year research project will draw on one million anonymous eye scans which are held on Moorfields’ patient database, with the aim to speed up the complex and time-consuming process of analysing eye scans...
Is It Brunch Time?
We begin by using the Twitter Streaming API. This API allows us to subscribe to search terms, for example “brunch”, and get any tweet matching that term sent to our program in real-time. Not only did we collected “brunch” tweets but we also collected tweets containing “breakfast”, “lunch”, and “dinner” to use as controls (which we will review later). We allowed the program run from 2015–06–01 to 2016–05–31 which yielded 100M+ tweets for analysis...
Why Python is Slow: Looking Under the Hood
When I teach courses on Python for scientific computing, I make this point very early in the course, and tell the students why: it boils down to Python being a dynamically typed, interpreted language, where values are stored not in dense buffers but in scattered objects...But I realized something recently: despite the relative accuracy of the above statements, the words "dynamically-typed-interpreted-buffers-vectorization-compiled" probably mean very little to somebody attending an intro programming seminar...So I decided I would write this post, and dive into the details that I usually gloss over...
Heavy Metal and Natural Language Processing - Part 1
Iain scraped and built a dataset of lyrics to 222,623 songs by 7,364 metal bands, then used traditional natural language processing techniques to analyze them...this post is a good tour through the natural language processor's toolkit -- Bag of Words Bayesian filtering, log-likelihood ratio, term frequency -Inverse document frequency, cosine distance, etc. The output of the analysis is sometimes fun and interesting, but the value here is mostly as a good primer on how the different techniques work and when you might use them...
The Toronto Raptors Are Using IBM’s Watson to Draft A Winning Team
After falling to the eventual NBA champs during the Eastern finals, the Toronto Raptors are hungry for a championship title. Thursday’s draft will be crucial in crafting a winning lineup, and when it comes to deciding who makes the team, the Raptors will be able to consult their newest recruit: IBM’s Watson...
Spatializing 6,000 years of global urbanization from 3700 BC to AD 2000
Here, we developed the first spatially explicit dataset of urban settlements from 3700 BC to AD 2000, by digitizing, transcribing, and geocoding historical, archaeological, and census-based urban population data previously published in tabular form by Chandler and Modelski...
Going Beyond Full Utilization: The Inside Scoop On Nervana’s Winograd Kernels
This is part 2 of a series of posts on how Nervana uses the Winograd algorithm to make convolutional networks faster than ever before. In the first part we focused on benchmarks demonstrating a 2-3x algorithmic speedup. This part will get a bit more technical and dive into the guts of how the Winograd algorithm works, and how we optimized it for GPUs...
clmtrackr - JavaScript Library For Precise Tracking Of Facial Features Via Constrained Local Models
clmtrackr is a javascript library for fitting facial models to faces in videos or images. It currently is an implementation of constrained local models fitted by regularized landmark mean-shift, as described in Jason M. Saragih's paper. clmtrackr tracks a face and outputs the coordinate positions of the face model as an array, following the numbering of the model below...
Jobs
Data Scientist - RockStar - NYC Rockstar Games (developers of Grand Theft Auto, Max Payne, Red Dead Redemption, L.A. Noire, Bully & more) is seeking an experienced data scientist to join our Analytics practice and help advance our business intelligence capabilities. Successful candidates will work with analytics and product leadership to assure that the most relevant real-time and historical data is identified, tracked, analyzed, and made actionable across all of our games...
Training & Resources
The Theorem Every Data Scientist Should Know
What is the Central Limit Theorem? Why is it important?...
Basic Interactive Geospatial Analysis in Python
Geospatial analysis is a massive field with a rich history. Python has some pretty slick packages for working with geospatial data such as, but not limited to, Shapeley, Fiona, and Descartes...GeoPandas sits on top of these packages and exposes a familiar Pandas-like API that makes a series of element-wise and aggregation methods (from the base packages) easy to apply to dataframes containing geometry data...
Data Science for Beginners video 1: The 5 questions data science answers
Get a quick introduction to data science from Data Science for Beginners in five short videos. This video series is helpful if you're interested in doing data science - or work with people who do data science - and you want to start with the most basic concepts...
Books
The Theory That Would Not Die:
How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy An enjoyable account of the history of Bayesian statistics from Thomas Bayes's first idea to the ultimate (near-)triumph of Bayesian methods in modern statistics...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian