Data Science Weekly - Issue 274
Issue #274 Feb 21 2019
Editor Picks
How We Created a Visual Search Engine for Hayneedle.com
This post explains the visual search project we did for Hayneedle, a home furnishings and decor retailer part of our larger Walmart family of brands. My team was tasked with creating a search experience for customers that would allow them to search for products purely through images instead of words...
Foundations Built for a General Theory of Neural Networks
Neural networks can be as unpredictable as they are powerful. Now mathematicians are beginning to reveal how a neural network’s form will influence its function...
How 20th Century Fox uses ML to predict a movie audience
Historically, movie studios have relied heavily on experience when deciding to invest in a particular script—but this can lead to huge risks, particularly when investing in new, original stories. The iterative and complex process of matching stories and audiences is something that Julie Rieger, President, Chief Data Strategist and Head of Media, and Miguel Campo-Rembado, SVP of Data Science, together with their team of data scientists at 20th Century Fox, decided to clarify with data...
A Message from this week's Sponsor:
Find A Data Science Job Through Vettery
Vettery specializes in tech roles and is completely free for job seekers. Interested? Submit your profile, and if accepted onto the platform, you can receive interview requests directly from top companies growing their data science teams.
Get started.
Data Science Articles & Videos
Keeping up with AI in 2019
The past year has been rich in events, discoveries and developments in AI. It is hard to sort through the noise to see if the signal is there and, if it is, what is the signal saying. This post attempts to get you exactly that: I’ll try to extract some of the patterns in the AI landscape over the past year. And, if we are lucky, we’ll see how some of the trends extend into the near future...
Better Language Models and Their Implications
We’ve trained a large-scale unsupervised language model which generates coherent paragraphs of text, achieves state-of-the-art performance on many language modeling benchmarks, and performs rudimentary reading comprehension, machine translation, question answering, and summarization — all without task-specific training...
Data Science Foundations: Know your data. Really, really, know it
Know your data, where it comes from, what’s in it, what it means. It all starts from there. If there is one piece of advice that I consistently give to every data person that’s starting out, whether they are going to be an analyst, scientist, or visualizer, this is it. This is the hill I spend the majority of my time on even now, to the point of obsession. It is a deeeeeeep but eminently important rabbit hole...
Self driving remote-control car with Apache MXNet
Autonomous driving is one of the most high-profile applications of deep learning. Recently AWS announced DeepRacer, a fully autonomous 1/18th scale race car driven by reinforcement learning. In this post I’ll show new-comers to machine learning how to assemble their very own self-driving remote control car and use Apache MXNet to teach it to race on a track...
Face Editing Generative Adversarial Network with User's Sketch and Color
Editing photos of faces using basic sketches, and letting a GAN do the rest. Lets you add/change: earrings, glasses, hair style, dimples, & more...
Introducing BodyPix:
Real-time Person Segmentation in the Browser with TensorFlow.js
We are excited to announce the release of BodyPix, an open-source machine learning model which allows for person and body-part segmentation in the browser with TensorFlow.js. With default settings, it estimates and renders person and body-part segmentation at 25 fps on a 2018 15-inch MacBook Pro, and 21 fps on an iPhone X...
Emergent Coordinated Multi-Agent Behaviors through Competition
We study the emergence of cooperative behaviors in reinforcement learning agents using a challenging competitive multi-agent soccer environment with continuous simulated physics. We demonstrate that decentralized, population-based training with co-play can lead to a progression in agents' behaviors: from random, to simple ball chasing, and finally showing evidence of cooperation...
Data science is different now
For the past couple years, I've been telling people who ask me for advice not to go into data science. Here's why: The data science job market is way oversaturated. Here's what they should do instead...
Competition
A$1 million on a journey to discovery with data
The Explorer Challenge starts 28 February. Can you find Australia’s next big mineral deposit? Apply your data science skills to help unearth the next generation of exploration in Australia, using real industry data.
Find out more and register: https://unearthed.link/EC_DSW
Jobs
Data Scientist - TRANZACT - Fort Lee, NJ or Raleigh, NC
Tranzact is a fast paced, entrepreneurial company offering a well-rounded suite of marketing solutions to help insurance companies stay ahead of the competition. The Data Scientist will be solving the toughest problems at Tranzact by using data. More specifically, responsible for gathering data, conducting analysis, building predictive algorithms and communicating findings to drive profitable growth and performance across Tranzact. Must have a strong grasp on the data structure, business needs, and statistical and predictive modeling...
Want to post a job here? Email us for details >> team@datascienceweekly.org
Training & Resources
TensorFlow squeeze: Use tf.squeeze to remove a dimension from Tensor
Learn how to use tf.squeeze to remove a dimension from Tensor in order to transfer a 1-D Tensor to a Vector, via a screencast video and full tutorial transcript...
Meta-Learning in 50 Lines of JAX
An introduction to "what is meta-learning" and a tutorial on implementing MAML in 50 lines of JAX...
Faster Gradient Boosting Decision Trees with binned features
A light GBM-style gradient boosting in scikit-learn is ready for reviews/test driving...
Books
The Book of R: A First Course in Programming and Statistics "The Book of R is a comprehensive, beginner-friendly guide to R, the world’s most popular programming language for statistical analysis. Even if you have no programming experience and little more than a grounding in the basics of mathematics, you’ll find everything you need to begin using R effectively for statistical analysis"...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S., Want to reach our audience / fellow readers? Consider sponsoring - grab a spot now; first come first served! All the best, Hannah & Sebastian