Data Science Weekly - Issue 153
Issue #153 Oct 27 2016
Editor Picks
What to Know Before You Get In a Self-driving Car
Uber thinks its self-driving taxis could change the way millions of people get around. But autonomous vehicles aren’t anywhere near to being ready for the roads...
Andrew Ng: Why Artificial Intelligence Is the New Electricity
Andrew Ng, chief scientist at the Chinese web services company Baidu, explains to Inc. reporter Salvador Rodriguez why breakthroughs in machine learning will transform countless industries...
Why I’m Teaching Twitch to Predict the Future
Predicting the future is not a theoretical superpower. It is a skill we are already relying on to make decisions, and like any other skill it can be rapidly improved with deliberate practice. Unsurprisingly, deliberate practice looks like making predictions within a domain and comparing the results to reality. This year I’ve built a culture of doing just that at Twitch. It’s the most exciting work I’ve ever done...
A Message from this week's Sponsor:
O'Reilly Live Training : Real-time. Real experts. Real learning.
Join Google’s Eli Bixby and Amy Unruh for a two-day, hands-on, in-depth exploration of TensorFlow, Google’s open source tensor computation framework that provides a powerful platform on which to build deep learning models. You’ll come away with a solid understanding of TensorFlow and the ability to successfully run a machine learning workflow.
Machine Learning + TensorFlow Training
November 1-2 | San Francisco, CA
Learn more >
Data Science Articles & Videos
Chatbots with Social Skills Will Convince You to Buy Something
Virtual assistants that can read social cues and nonverbal signals are less jarring—and surprisingly persuasive...
How D.C. Grew a Data-Driven Tree Strategy
One of the biggest benefits of data is its versatility. Washington, D.C.’s Urban Forestry Administration (UFA) turned to tech to answer a simple question, but ended up using the resulting information to reconfigure much of its mission and work. In the process, UFA set an example for cities everywhere about how powerful a new set of data can be when creatively leveraged...
From Superstar Culture to Moneyball: How Data is Changing the NBA
Data is quickly overtaking every industry. Business decisions on everything from how much inventory to buy to how many employees to hire are being influenced by newly-available data points and continually decreasing cost of computing power. These same drivers are creating an “everywhere analytics” culture that impacts us everywhere—and that includes professional basketball. So in honor of the tip-off of the 2016-2017 NBA season, let’s take a look at how data is revolutionizing basketball....
Python and Machine Learning in Astronomy
The advances in Astronomy over the past century are both evidence of and confirmation of the highest heights of human ingenuity. We have learned by studying the frequency of light that the universe is expanding. By observing the orbit of Mercury that Einstein's theory of general relativity is correct. It probably won't surprise you to learn that Python and data science play a central role in modern day Astronomy...
Influence of Pokémon Go on Physical Activity: Study and Implications
Pokemon Go increased U.S. activity levels by 144 billion steps in just 30 days...
Building an efficient neural language model over a billion words
In the area of language modeling, recent advances have been made leveraging massively large models that could only be trained on a large GPU cluster for weeks at a time. While impressive, these processing-intensive practices favor exploring on large computational infrastructures that are typically too expensive for academic environments and impractical in a production setting, limiting the speed of research, reproducibility, and usability of the results. Recognizing this computational bottleneck, Facebook AI Research (FAIR) designed a novel softmax function approximation tailored for GPUs to efficiently train neural network based language models over very large vocabularies...
DynamoDB Learnings
At Hinge, we have been using Dynamodb in production for more than 8 months and we just relaunched two weeks ago with full capacity. I want to share couple of learnings and why it made sense for us to store ratings in DynamoDB since I own the rating processing in the application. We are processing millions of ratings per day, upto so far, DynamoDB is holding pretty good so far. They are also crucial for our recommender to get smarter, so care is very much needed for ratings...
Automatic chemical design using a data-driven continuous representation of molecules
We report a method to convert discrete representations of molecules to and from a multidimensional continuous representation. This generative model allows efficient search and optimization through open-ended spaces of chemical compounds. We train deep neural networks on hundreds of thousands of existing chemical structures to construct two coupled functions: an encoder and a decoder...
Jobs
Data Scientist (Post Grad Programme) - Capital One - Nottingham, UK Data Scientists at Capital One are continually increasing our understanding of our customers and of the markets in which we operate. So, if you have in-depth knowledge of R, experience in extracting nuggets of insight from huge amounts of data and a Masters or PhD background, we think you’ll feel right at home in our team...
Training & Resources
PyWren
So I wrote PyWren in my "spare time" (fellow postdocs will get why this is in quotes) to let you do exactly this. It's a microservices-Condor, a Wren! It's basically just "map-reduce" minus the "reduce" using AWS Lambda and some awesome python serialization technology originally developed by a now-defunct company called PiCloud that offered a similar service that I loved...
Pandas Tutorial: DataFrames in Python
With this tutorial, DataCamp wants to address 11 of the most popular Pandas DataFrame questions so that you understand -and avoid- the doubts of the Pythonistas who have gone before you...
Clustering: A Guide for the Perplexed
Finding clusters is a powerful tool for understanding and exploring data. While the task sounds easy, it can be surprisingly difficult to do it well. Most standard clustering algorithms can, and do, provide very poor clustering results in many cases. We discuss how to do clustering correctly...
Books
Statistics As Principled Argument "This is a great book. Everyone who uses statistics in any way should read it. Maybe everyone who READS articles that contain statistics should read it! The mathematics is minimal (very few formulas, and those are basic), but a lot of very good advice on how to use statistics sensibly (and how it is sometimes used nonsensically!)"...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian