Data Science Weekly - Issue 238
Issue #238 June 14 2018
Editor Picks
Why did the Neural Network cross the road?
Can a machine learning algorithm learn to tell a joke? I’ve experimented with neural networks and jokes before, teaching them to tell knock-knock jokes, or to generate April Fools pranks. In each case, the results were underwhelming. However, that could have been because the algorithm didn’t have much data to work with, just a couple of hundred examples of each type of joke. What happens when I give a neural network a LOT of examples to copy?...
Why the Future of Machine Learning is Tiny
I’m convinced that machine learning can run on tiny, low-power chips, and that this combination will solve a massive number of problems we have no solutions for right now. That’s what I’ll be talking about at CogX, and in this post I’ll explain more about why I’m so sure...
Machine learning predicts World Cup winner
Researchers have predicted the outcome after simulating the entire soccer tournament 100,000 times...
A Message from this week's Sponsor:
Quick Question For You: Do you want a Data Science job?
After helping hundred of readers like you get Data Science jobs, we've distilled all the real-world-tested advice into a self-directed course.
The course is broken down into three guides:
Data Science Getting Started Guide. This guide shows you how to figure out the knowledge gaps that MUST be closed in order for you to become a data scientist quickly and effectively (as well as the ones you can ignore)
Data Science Project Portfolio Guide. This guide teaches you how to start, structure, and develop your data science portfolio with the right goals and direction so that you are a hiring manager's dream candidate
Data Science Resume Guide. This guide shows how to make your resume promote your best parts, what to leave out, how to tailor it to each job you want, as well as how to make your cover letter so good it can't be ignored!
Data Science Articles & Videos
AI could get 100X more energy-efficient with IBM’s new artificial synapses
Copying the features of a neural network in silicon might make machine learning more usable on small devices like smartphones...
Data science vs. statistics: two cultures?
Exploring what data science means for the modern discipline of statistics...
Improving Language Understanding with Unsupervised Learning
We’ve obtained state-of-the-art results on a suite of diverse language tasks with a scalable, task-agnostic system, which we’re also releasing. Our approach is a combination of two existing ideas: transformers and unsupervised pre-training. These results provide a convincing example that pairing supervised learning methods with unsupervised pre-training works very well...
Graph Convolutional Neural Networks for Web-Scale Recommender Systems
Here we describe a large-scale deep recommendation engine that we developed and deployed at Pinterest. We develop a data-efficient Graph Convolutional Network (GCN) algorithm PinSage, which combines efficient random walks and graph convolutions to generate embeddings of nodes (i.e., items) that incorporate both graph structure as well as node feature information. Compared to prior GCN approaches, we develop a novel method based on highly efficient random walks to structure the convolutions and design a novel training strategy that relies on harder-and-harder training examples to improve robustness and convergence of the model...
Physicist Max Tegmark on the promise and pitfalls of artificial intelligence
Tegmark recently spoke about AI’s potential — and its dangers — at IPsoft’s Digital Workforce Summit in New York City. After the keynote address, we spoke via phone about the challenges around AI, especially as they relate to autonomous weapons and defense systems like the Pentagon’s controversial Project Maven program. Here’s an edited transcript of the interview...
Sentiment analysis: 10 applications and 4 services
Overview of the services offered by Amazon, Google, Microsoft and IBM. Among other resources, the article includes sample code, benchmark results and a list of sentiment analysis applications...
Through-Wall Human Pose Estimation Using Radio Signals
A researcher at MIT, has been developing technology for detecting people and their movements behind a solid wall using radio waves. The approach relies on cutting-edge machine learning to interpret the signals...
The Trouble with D3
Recently there were a couple of threads on Twitter discussing the difficulties associated with learning d3.js. I’ve also seen this come up in many similar conversations I’ve had at meetups, conferences, workshops, mailing list threads and slack chats. While I agree that many of the difficulties are real, the threads highlight a common misconception that needs to be cleared up if we want to help people getting into data visualization...
Jobs
eCommerce Data Science & Machine Learning Analyst - PepsiCo - NYC
Have a strong opinion about Tensorflow lacking an autoregressive dynamic network? So do we!
PepsiCo’s eCommerce Data Science and Analytics group is a team of data scientists, technology specialists, and business innovators who operate within eCommerce to build industry-leading systems and solutions. By focusing on machine learning and automation, the Data Science & Analytics group is pushing the bounds of possibility for PepsiCo and its strategic partners...
Training & Resources
Visualize TensorFlow Graph In TensorBoard
Learn how to use TensorFlow Summary File Writer (tf.summary.FileWriter) and the TensorBoard command line unitility to visualize a TensorFlow Graph in the TensorBoard web service, via a screencast video and full tutorial transcript...
Data Cleaning with Python and Pandas: Detecting Missing Values
According to IBM Data Analytics you can expect to spend up to 80% of your time cleaning data. In this post we’ll walk through a number of different data cleaning tasks using Python’s Pandas library. Specifically, we’ll focus on probably the biggest data cleaning task, missing values...
datasheets:
Read data from, write data to, modify formatting of Google Sheets
Library for interfacing with Google Sheets. It is built on top of Google's google-api-python-client and oauth2client libraries using the Google Drive v3 and Google Sheets v4 REST APIs...
Books
Test-Driven Machine Learning The book begins with an introduction to test-driven machine learning and quantifying model quality. From there, you will test a neural network, predict values with regression, and build upon regression techniques with logistic regression...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S., Want to reach our audience / fellow readers? Consider sponsoring - grab a spot now; first come first served! All the best, Hannah & Sebastian