Data Science Weekly - Issue 60
Issue #60 Jan 15 2015
Editor Picks
R is still hot, and getting hotter
It's been more than four years since I wrote the white paper R is Hot with the goal of introducing R to companies who need modern and flexible data analysis software. It's still the most-downloaded whitepaper on the Revolution Analytics website. But a lot has changed in the past four years: R's popularity has grown, and more and more companies are adopting R for various applications. So I decided to update the paper with the latest statistics on R usage, and even more examples of how R is used in practice...
Raspberry Pi Engine Control with Real-Time Adaptive Extreme Learning Machine
This is a brief video of my first attempt at engine control with an adaptive Extreme Learning Machine algorithm I designed to predict near chaotic Homogeneous Charge Compression Ignition (HCCI) combustion in real-time...
Winning Artificial Intelligence for Angry Birds
This is our artificial intelligence that won the annual Angry Birds Artificial Competition...
Data Science Articles & Videos
Machine learning for fraud detection (at Stripe)
Using data from across the Stripe network, we’ve developed a machine learning system that evaluates charges in real-time and blocks those that are almost certainly fraudulent. By analyzing hundreds of different characteristics pertaining to each payment, these algorithms have already shielded businesses on Stripe from millions of attempted fraudulent charges...
Deep Image: Scaling up Image Recognition
We present a state-of-the-art image recognition system, Deep Image, developed using end-to-end deep learning. On one of the most challenging computer vision benchmarks, the ImageNet classification challenge, our system has achieved the best result to date, with a top-5 error rate of 5.98% - a relative 10.2% improvement over the previous best results...
PyData: The Next Generation
State of the union and questions for Python, Big Data, Analytics, and so forth in 2015 onward...
The Revolution in Astronomy Education: Data Science for the Masses
We address the impact of the emerging discipline of data science on astronomy education within two contexts: formal education and lifelong learners...
A brilliant visualization of population density across 9 cities
Here's a great graphic from LSE Cities that depicts population density in a much more granular and telling way than a simple summary statistic like people per square mile could...
Taste and Trust
A promise of social networking platforms is their ability to leverage your social network to provide useful recommendations. We are overwhelmed by the choices available to us, and historically we've relied on our social environment to help us navigate them. Yes, there’s a risk that personalizing our experiences based on our social networks will trap us in a filter bubble. But that’s not my biggest concern. I’m interested in the roles that taste and trust play in making recommendations useful...
Game Theorists Crack Poker
An 'essentially unbeatable' algorithm for the popular card game points to strategies for solving real-life problems without having complete information...
Facial Keypoints Detection
Our [Kaggle's] friend Daniel Nouri, founder of Natural Vision and top contender on the leaderboard, has written a tutorial blog post on this competition. The tutorial outlines how to use convolutional neural nets to detect facial key points on this competition's dataset...
The Three Kinds Of Data Science Project Exams That Show Up In
A Data Science Interview
You've got an interview and you've found out an exam will be given to you. All that you've been told is that, at some point of your choosing in the next few weeks, you'll be given 4 hours to ingest and operate on a sizable data set using your programming language of choice. This is scary...
Jobs
Data Scientist - Lending Club, San Francisco Lending Club is the world’s largest online marketplace connecting borrowers and investors... looking to hire a big data scientist. Data, and lots of it, is at the core of Lending Club’s business. We use our rapidly growing dataset to understand the market, make credit decisions, predict performance, optimize ROI, and define product strategy. We need a scientist who is excited about helping us build our next generation big data analytics platform that runs on Hadoop...
Training & Resources
Introducing practical and robust anomaly detection in a time series
Both last year and this year, we saw a spike in the number of photos uploaded to Twitter on Christmas Eve, Christmas and New Year’s Eve (in other words, an anomaly occurred in the corresponding time series). Today, we’re announcing AnomalyDetection, our open-source R package that automatically detects anomalies like these in big data in a practical and robust way...
ggplot2 1.0.0
As you might have noticed, ggplot2 recently turned 1.0.0. This release incorporated a handful of new features and bug fixes, but most importantly reflects that ggplot2 is now a mature plotting system and it will not change significantly in the future...
Top 77 R posts for 2014
Links to the top 77 most read R posts of 2014...
Books
Mastering 'Metrics: The Path from Cause to Effect Recent release: connects the dots between mathematical formulas, statistical methods, and real-world use cases ...
"Clear, interesting and an enjoyable read. There is sufficient mathematical formulaic representations without overwhelming those who are new to the literature..."
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Happy New Year to all! Wishing you a wonderful 2015 :)
- All the best, Hannah & Sebastian