Data Science Weekly - Issue 250
Issue #250 Sept 6 2018
Editor Picks
Unsupervised machine translation: A novel approach to provide fast, accurate translations for more languages
Automatic language translation is important to Facebook as a way to allow the billions of people who use our services to connect and communicate in their preferred language. This new method opens the door to faster, more accurate translations for many more languages. And it may only be the beginning of ways in which these principles can be applied to machine learning and artificial intelligence...
Physicists hack the human visual system to create “ghost images”
A pioneering experiment will extend human vision to invisible wavelengths, say researchers...
Deep learning of aftershock patterns following large earthquakes
Besides offering better predictions, interpretations of the model suggest promising directions for new physical theories...
A Message from this week's Sponsor:
Mode Studio: SQL, Python, R, & charts in one platform
No more jumping between applications. Mode Studio is the analytics toolkit that brings everything together, and gets out of the way. Explore data in our SQL editor, and pass results to integrated Python or R notebooks for deeper exploration and visualization. You can also layer charts over results quickly with built-in visualization tools, and sharing is easy—just send the report URL to teammates when you're ready...
Data Science Articles & Videos
Desperate for Data Scientists - LinkedIn reports dramatically increasing shortage of data scientists across U.S.
What a difference a few years makes. In 2015, a LinkedIn snapshot of what it calls the skills gap—a mismatch between the skills workers have and the skills employers seek—showed a national surplus in the United States of people with data science skills; as of August 2018, LinkedIn data shows a dramatic shortage...LinkedIn calculates that, in August, employers were seeking 151,717 more data scientists than exist in the U.S...
Putting the Power of Kafka into the Hands of Data Scientists
Over a year ago, my fellow data infrastructure engineers and I broke ground on a total rewrite of our event delivery infrastructure. Our mission was to build a robust, centralized data integration platform tailored to the needs of our Data Scientists. The platform would be fully self-service, so as to maximize the Data Scientists’ autonomy and give them complete control over their event data. Ultimately, we delivered a platform that is revolutionizing the way Data Scientists interact with Stitch Fix’s data...
Adversarial Examples that Fool Computer Vision and Time-Limited Humans
Machine learning models are vulnerable to adversarial examples: small changes to images can cause computer vision models to make mistakes such as identifying a school bus as an ostrich. However, it is still an open question whether humans are prone to similar mistakes. Here, we address this question by leveraging recent techniques that transfer adversarial examples from computer vision models with known parameters and architecture to other models with unknown parameters and architecture, and by matching the initial processing of the human visual system. We find that adversarial examples that strongly transfer across computer vision models influence the classifications made by time-limited human observers...
PyTorch implementations of algorithms for density estimation
A PyTorch implementations of Masked Autoregressive Flow and some other invertible transformations from Glow: Generative Flow with Invertible 1x1 Convolutions and Density estimation using Real NVP...
Easy-to-make videos can show you dancing like the stars
Want to dance like a professional ballerina or strut like a rapper? A new machine-learning technique can transfer one person’s motion to another in a simple process...
Data Science and Robots
If you want to understand the inner workings of machine learning and deep learning models, there is a very robust set of posts and videos here by Brandon Rohrer who according to his twitter profile "coaxes answers out of big piles of numbers for Facebook, builds robot brains for fun"...
Making it easier to discover datasets
There are many thousands of data repositories on the web, providing access to millions of datasets; and local and national governments around the world publish their data as well. To enable easy access to this data, we [Google] launched Dataset Search, so that scientists, data journalists, data geeks, or anyone else can find the data required for their work and their stories, or simply to satisfy their intellectual curiosity....
A neural attention model for speech command recognition
Attention models are powerful tools to improve performance on natural language, image captioning and speech tasks. The proposed model establishes a new stateof-the-art accuracy of 94.1% on Google Speech Commands dataset V1 and 94.5% on V2 (for the 20-commands recognition task), while still keeping a small footprint of only 202K trainable parameters...
Jobs
Data Scientist - Dataminr - NYC
We are Dataminr, the leading company that turns social media into real-time, actionable alerts. Our ability to find and deliver information faster than any traditional source has completely revolutionized how critical, relevant and actionable information reaches the news, finance, public sector and corporate security industries.
You're a dedicated Data Scientist who wants nothing more than to help us sort, analyze and deliver relevant information from terabytes of unstructured data in the social media space. You will build machine learning models to transform social media feeds into actionable items. You will also work with engineers, product managers and other teams to solve challenging problems with your data science skills...
Training & Resources
List All Tensor Names In A TensorFlow Graph
Learn how to use the TensorFlow Get Operations Operation to list all Tensor names in a TensorFlow graph, via a screencast video and full tutorial transcript...
Civic and Political APIs, Data Sets, and Websites
Your guide to finding reliable data to help you build great civic and political tools...
Multi-GPU training with Estimators, tf.keras and tf.data
Helpful walkthrough of Multi-GPU training with Custom Estimators...
Books
Bayes Theorem: A Visual Introduction For Beginners "This book takes what can be a daunting and complex subject and breaks it down with a series of easy to follow examples which buildup to deliver a great overall explanation of how to use Bayes Theorem for basic analysis and even off-the-cuff critical thinking"...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S., Want to reach our audience / fellow readers? Consider sponsoring - grab a spot now; first come first served! All the best, Hannah & Sebastian