Data Science Weekly - Issue 76
Issue #76 May 7 2015
Editor Picks
Machine Learning for Emoji Trends
In October 2011, Apple added the emoji keyboard to iOS as an international keyboard. Since then, digital language has evolved such that nearly half of comments and captions on Instagram contain emoji characters. And earlier this week, Instagram also added support for emoji characters in hashtags, which allows people to tag and search content with their favorite emoji #🎉. In Part 1 of this blog post series, we will take a deep dive into emoji usage on Instagram. By applying machine learning and natural language processing techniques, we’ll discover the hidden semantics of emoji...
How we’re using Machine Learning to fight shell selling
In this first in an occasional series, we’re taking a look at machine learning initiatives at WePay — the kinds of problems we use machine learning for, how we build technology to address them, and how the unique challenges of the payments industry shape our approach. We thought the best introduction would to be to look at an actual fraud problem we face, shell selling, and how we built the algorithm we’re now using to solve it...
The Simple, Elegant Algorithm That Makes Google Maps Possible
​ Algorithms are a science of cleverness. A natural manifestation of logical reasoning—​mathematical induction, in particular—a good algorithm is like a fleeting, damning snapshot into the very soul of a problem. A jungle of properties and relationships becomes a simple recurrence relation, a single-line recursive step producing boundless chaos and complexity. And to see through deep complexity, it takes cleverness...
Data Science Articles & Videos
Predictive Machine Learning — Behind The Scenes at Fliptop
At Fliptop, our data science team uses machine learning to create a range of predictive models. The Fliptop predictive platform, or as we call it, Darwin, automatically builds an array of machine-learning models and applies dozens of statistical measures to determine which model is most predictive at each stage of the marketing and sales lifecycle. In this post, we’ll offer a glimpse into how Darwin works, focusing specifically on the predictive lead scoring component of our platform. From there, we’ll offer a few predictions of our own about where we see marketing technology going in coming years...
Hacking Google Finance in Real-Time for Algorithmic Traders.
(2) Pre-Market Trading
It has been over a year since I posted Hacking Google Finance in Real-Time for Algorithmic Traders article. The idea stays the same as previously, however, our goal this time is to monitor changes of stock prices provided by Google Finance in real-time before the market opens...
Parallel Machine Learning with Hogwild!
In this blog post, I will explain what stochastic gradient descent (SGD) is and how thread locking has a very large effect on performance. I will attempt to explain how parallel algorithms for machine learning such as Hogwild! work, why they have transformed big data analytics, and how GraphLab Create not only adopts these techniques but also actively pushes the frontier of parallel machine learning algorithms...
Deep Learning for Decision Making and Control
A remarkable feature of human and animal intelligence is the ability to autonomously acquire new behaviors. This research is concerned with designing algorithms that aim to bring this ability to robots and simulated characters. In this talk, Levine will describe a class of guided policy search algorithms that tackle this challenge by transforming the task of learning control policies into a supervised learning problem, with supervision provided by simple, efficient trajectory-centric methods...
Target acquired: Finding targets in drone and quadcopter video streams using Python and OpenCV
So perhaps it comes as no surprise, now that I’m 26 years old, that I felt the urge to get back into RC. But instead of cars, I wanted to do something that I had never done before — drones and quadcopters. Which leads us to the purpose of this post: developing a system to automatically detect targets from a quadcopter video recording...
Image Scaling using Deep Convolutional Neural Networks
This past summer I interned at Flipboard in Palo Alto, California. I worked on machine learning based problems, one of which was Image Upscaling. This post will show some preliminary results, discuss our model and its possible applications to Flipboard’s products...
Scientists dramatically improve method for finding common genetic alterations in tumors
St. Jude Children's Research Hospital scientists have developed a significantly better computer tool for finding genetic alterations that play an important role in many cancers but were difficult to identify with whole-genome sequencing...
Why Topological Data Analysis Works
Topological data analysis has been very successful in discovering information in many large and complex data sets. In this post, I would like to discuss the reasons why it is an effective methodology...
How I Became Chief Data Scientist
I’m the U.S. Chief Data Scientist — and I got my start in community college. Yes, I’ve got a Ph.D. in applied mathematics, have been fortunate to help build amazing companies, and have been at the forefront of the data science movement. But the critical first step in that journey started at De Anza Jr. College in Cupertino, California...
Jobs
Data Scientist - Electronic Arts - Redwood City, CA EA is seeking a Data Scientist for its Red Crow studio, reporting to the studio Product Manager. We are looking for a professional with advanced statistical analysis skills. This role requires a passion for understanding players’ behavior through data, high attention to detail and data integrity... focus on analyzing large sets of data surrounding acquisition, engagement, and monetization and helping to automate this process for game teams, analysts and product managers. The individual should have a desire for data mining, scripting, problem solving and statistical analysis. This person will preferably have a strong interest in gaming (specifically mobile or social) or a fast paced company where data is core to its operations. ...
Training & Resources
Neon: Python-based, deep learning framework from Nervana Systems
Nervana's Python-based, open source Deep Learning Framework...
10 Common NLP Terms Explained for the Text Analysis Novice
If you’re relatively new to the NLP and Text Analysis world, you’ll more than likely have come across some pretty technical terms and acronyms, that are challenging to get your head around, especially, if you’re relying on scientific definitions for a plain and simple explanation. We decided to put together a list of 10 common terms in Natural Language Processing which we’ve broken down in layman terms, making them easier to understand..
Apache Zeppelin
A web-based notebook that enables interactive data analytics...
Books
Data Science from Scratch: First Principles with Python NEW RELEASE: Learn how many of the most fundamental data science tools and algorithms work by implementing them from scratch...
"Good, grounds-up, guide on how to get started..."
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
Â
P.S. Enjoyed the newsletter? Please forward it along to friends and colleagues - we'd love to have them onboard! - All the best, Hannah & Sebastian