[in case you missed it] Data Science Weekly - Issue 302
Issue #302 Sept 5 2019
Editor Picks
Yann LeCun: Deep Learning, Convolutional Neural Networks, and Self-Supervised Learning | AI Podcast
Yann LeCun is one of the fathers of deep learning, the recent revolution in AI that has captivated the world with the possibility of what machines can learn from data. He is a professor at New York University, a Vice President & Chief AI Scientist at Facebook, co-recipient of the Turing Award for his work on deep learning. He is probably best known as the founding father of convolutional neural networks, in particular their early application to optical character recognition. This conversation is part of the Artificial Intelligence podcast...
How Much Do Data Scientists Make?
We Use H1B Salary Data to Explore the Salaries of Data Scientists
I've written a lot about the data science profession lately — both in terms of how to break in and whether the analytics industry itself is vulnerable to the automation trend that it is spearheading. Today we will cover the juiciest topic of all, compensation. If the market for data scientists is so hot, then just how much exactly are they being paid?...
An unexpected use for face recognition: tracking chimpanzees
It took a deep learning algorithm a fraction of a second to identify chimps and classify their genders. Expert human labelers were given nearly an hour to complete the same task...
A Message from this week's Sponsor:
Data scientists are in demand on Vettery
Vettery is an online hiring marketplace that's changing the way people hire and get hired. Ready for a bold career move? Make a free profile, name your salary, and connect with hiring managers from top employers today.
Data Science Articles & Videos
nobody knows you're a bot
Every week, the New Yorker magazine runs a caption contest. I’ve entered this contest, unsuccessfully, dozens of times. The problem is that I’m not very funny. But computers? Computers are funny as hell. What if I could have one write captions for me?...
Deep learning enables rapid identification of potent DDR1 kinase inhibitors
Model generates a novel drug in 21 days, & after 25 more days of synthesis & testing, its predicted biological & chemical properties were confirmed as favorable in mice...
Visualizing Personality Profile of A Film Character Using Python & IBM Watson
A simple guide to data collection, light preprocessing and visualization of the big five personality traits of any film character...
Creating a data set and a challenge for deepfakes
Facebook is partnering with industry leaders and academic researchers to create the DeepFake Detection Challenge, a collaborative effort to build new tools to detect videos that have been manipulated with AI...
Credit Card Fraud Detection
What can we do to mitigate the risk? While there are a lot of methods to limit the loss and prevent fraud, and I’ll walk you through my process and show you my findings. To start, I gathered my data from a Kaggle dataset which contained 285,000 rows of data and 31 columns...
Smaller, faster, cheaper, lighter:
Introducing DistilBERT, a distilled version of BERT
We decided to focus on distillation: a technique you can use to compress a large model, called the teacher, into a smaller model, called the student...
Diverse Image Synthesis from Semantic Layouts via Conditional IMLE
In this paper, we focus on the problem of generating images from semantic segmentation maps and present a simple new method that can generate an arbitrary number of images with diverse appearance for the same semantic layout. Unlike most existing approaches which adopt the GAN framework, our method is based on the recently introduced Implicit Maximum Likelihood Estimation (IMLE) framework. Compared to the leading approach, our method is able to generate more diverse images while producing fewer artifacts despite using the same architecture...
ML Best Practices: Test Driven Development at Latent Space
I sat down with the Latent Space team to talk about best practices around collaboration and managing model iteration...
Quantity Versus Quality In Your Data Science Portfolio
A prospective employer will look at your online profiles. This will help them get a bigger picture of you than what was in your resume and profile you filled out for your data science job application. When they find your data science portfolio they will judge your work. Which brings up a very important question - should you focus on quality or quantity?...
You might be interested*
Introducing the first platform for investing in blue-chip art
The $1.7T art market has long been one of the most lucrative investment opportunities for the ultra-wealthy. However, these paintings are expensive to acquire and the art market is generally burdened by high fees and lack of transparency. With a revolutionary legal structure, Masterworks is disrupting one of the largest unregulated financial markets in the world. Now, anyone can invest in masterpiece paintings by artists like Warhol, Monet, and Banksy.
Explore our exclusive investments...
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Jobs
Senior Data Scientist - 12traits - Berlin (Germany) and Los Angeles (USA)
At 12traits we believe the path to a better future lies in unlocking the true potential of humankind. We do this through harmonizing human behavior data with neuropsychological data, leveraging a multitude of scientifically valid data perspectives that allow us to understand human beings more deeply and sustainably than ever before: ushering in a new era of personalized experiences, starting in gaming, optimized for human potential and health.
As senior data scientist, you’ll work closely with psychometricians, engineers, as well as UX designers and researchers to actualize the potential derived from combining some of the richest behavioral data sets available with the largest database of highly valid cognitive data. You'll not only get to push your own boundaries, but the boundaries of artificial intelligence...
Want to post a job here? Email us for details >> team@datascienceweekly.org
Training & Resources
Multi-Object Datasets
This repository contains datasets for multi-object representation learning, used in developing scene decomposition methods like MONet and IODINE...
Introducing Neural Structured Learning in TensorFlow
We are excited to introduce Neural Structured Learning in TensorFlow, an easy-to-use framework that both novice and advanced developers can use for training neural networks with structured signals. Neural Structured Learning (NSL) can be applied to construct accurate and robust models for vision, language understanding, and prediction in general...
How To Make Custom AI-Generated Text With GPT-2
I've written a (lengthy!) blog post on how to finetune GPT-2 and generate text using gpt-2-simple, along with a history of GPT-2 finetuning and its future...
Books
Python Crash Course: A Hands-On, Project-Based Introduction to Programming Thorough introduction to programming with Python...
"I have read multiple beginner guides to Python. I am currently up to chapter 11 in Python Crash Course. So far this is far and away my favorite Python programming book..."
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian