Data Science Weekly - Issue 53
Issue #53 Nov 27 2014
Editor Picks
A classroom experiment in Twitter Bots and creativity
This semester, I’m teaching a class in Story Bots, which is really a programming class disguised as a journalism class. One of the assignments was to make a Twitter bot. They could do what they wanted with it, but it had to use Twitter and had to run on a simple cron job. It had to tweet, and they had to put their bot on Github when they were done (sans access keys). Here’s what they came up with...
Yahoo Labs' Algorithm Identifies Creativity in 6-Second Vine Videos
Nobody knew how to automatically identify creativity until researchers at Yahoo Labs began studying the Vine livestream...
Starting data analysis/wrangling with R: Things I wish I'd been told
I have been working quite intensively with R for the last half year, and thought I'd tr to document and share a few tricks, and things I wish I'd have known when I started out...
Data Science Articles & Videos
The Sphere of my Influence
Amazing visualization of influencers of key people using Wikipedia...
Twitter "Exhaust" Reveals Patterns of Unemployment
Twitter data mining reveals surprising detail about socioeconomic indicators but at a fraction of the cost of traditional data-gathering methods, say computational sociologists...
Online Dating: A Less Stupid Cupid?
We so far haven’t seen a Google or Netflix of online dating—no one company that has nailed it and run away with the category. And in technology, when a constantly changing gaggle of companies contend for pieces of the same market, it almost always means one thing: None of them are getting it right...
Parse Push Experiments: A/B Testing Best Practices and the Screencast
We recently launched Push Experiments, which lets you conduct A/B tests on push notification campaigns to identify the most engaging message variant. With Push Experiments, we wanted to make it easy to run successful A/B tests. In this post, we’ll discuss some of the statistical techniques we’re using behind the scenes. And, catch a screencast showing you the ins and outs of the new tool below...
NBA Per Game Data Correlation-Scatter Matrix
Impressive r-stats data visualization using d3js, scatter and correlation matrices, and all ever NBA per game stat data...
Detecting Barcodes in Images with Python and OpenCV
The goal of this blog post is to demonstrate a basic implementation of barcode detection using computer vision and image processing techniques...
Inheritance Patterns in Citation Networks Reveal Scientific Memes
Memes are the cultural equivalent of genes that spread across human culture by means of imitation. What makes a meme and what distinguishes it from other forms of information, however, is still poorly understood. Our analysis of memes in the scientific literature reveals that they are governed by a surprisingly simple relationship between frequency of occurrence and the degree to which they propagate along the citation graph...
The World Cup Problem Part 2: Germany v. Argentina
In the final match of the 2014 FIFA World Cup, Germany defeated Argentina 1-0. How much evidence does this victory provide that Germany had the better team? What is the probability that Germany would win a rematch?...
GTrendsR package to Explore Google trending for Field Dependent Terms
Ever have a toy you know is super cool but don’t know what to use it for yet? That’s GTrendsR for me. So I made up an activity to use it for, that’s related to my own interests...
Jobs
Metis - Data Science Lead (New York) We are looking for a Data Science Lead to head the teaching of our data science bootcamps and to champion our growth in the field of data science education and training. We are seeking an experienced data scientist who loves and excels at analyzing and visualizing data to solve important problems, is passionate about teaching others to be premier data scientists, and is fun to be around and work with. We want a leader who can envision new opportunities for growth, attract talent to pursue those opportunities, and help create exceptional data science training products...
Training & Resources
MLconf SF
All the slides and videos from MLConf SF 2014...
Mocha Documentation
Mocha is a Deep Learning framework for Julia...
Brushfire open source release
Distributed decision tree ensemble learning in Scala...
Books
I Heart Logs: Event Data, Stream Processing, and Data Integration Short book highlighting the importance of logs...
"An interesting and informative view about log data and systemic concerns dealing with log data. Jay talks about various aspects of log data such as duality with tables, distribution, coordination of distributed sources, and data integration. He does so by appealing to his experience at LinkedIn and the new developments in the big data technology community..."
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Enjoyed the newsletter? Please forward it to friends and peers - we'd love to have them onboard too :-) - All the best, Hannah & Sebastian