Data Science Weekly - Issue 139
Issue #139 July 21 2016
Editor Picks
A neural network tried to write a 9th Harry Potter book, and the results are hilarious
“I’ve been experimenting with deep learning over the past few weeks, and the Harry Potter story is the result of one of those experiments,” creator Max Deutsch tells Digital Trends. “Beyond just looking for a fun way to practice what I’ve been learning, the Harry Potter project was an attempt to make something enjoyable to read.”...
Creating a Beer Recommendation Engine
Beer is one of my passions. I’m an award-winning homebrewer. I’ve judged beer competitions. I’m an active member in my local homebrewing clubs. I’ve reviewed just under 1000 unique beers on Untappd. I have several floorplan ideas for the taproom of the craft brewery I will definitely someday own. I’m drinking a beer while I’m writing this post. My goal is to create a recommendation engine for beer that is actually useful...
Central Limit Theorem – Interactive Data Visualization Explanation
This is an attempt to visually explain the core concepts of the Central Limit Theorem. By providing a variety of interactive components, this page seeks to provide an intuitive understanding of one of the foundational theories behind inferential statistics. It draws inspiration from other visual explanations...
A Message from this week's Sponsor:
Where science and policy change the world. And You.
Apply your knowledge & skills to federal policy via the AAAS Science & Technology Policy Fellowships. A year-long professional development opportunity for doctoral level data scientists to serve in the federal government in Washington, D.C.
STPF fosters a career-enhancing network of science leaders who understand policymaking & contribute to society...
Data Science Articles & Videos
What we’ve learned about brands in London from 5 million Instagram posts
As any modern fashion mecca and large financial center London is big on instagram, so it’s not surprising it is the most instagrammed city in Great Britain and 2nd one in the world after New York, and followed by Paris. What do Londoners and guests of the city instagram about? What places do they like the most? Where do they feel miserable?...
Data Science at Zymergen
Zymergen is an SF Bay Area startup that uses software, robotics, and advanced genetic engineering techniques to make industrial microbes more effective at producing particular chemicals, or even to create brand new compounds. I recently joined Zymergen to manage the Core Infrastructure and Data Science teams...
Simple R algorithm to find the Rectangularness of Countries
A Facebook friend recently noted that Turkey was "a remarkably rectangular country." I wondered how it compared to other countries, and this post shows my answers: Turkey is 15th; Egypt is the most rectangular...
Using Keras and Deep Q-Network to Play FlappyBird
200 lines of python code to demonstrate DQN with Keras...
A Review of Travel Chatbots
Travel search is one of the most common use cases for chatbots, we reviewed the 5 main travel bots on Facebook Messenger... Some bots understand complex text input, others require information provided one at a time in a ping ping conversation...
Probablistic Filters By Example
Probablistic filters are high-speed, space-efficient data structures that support set-membership tests with a one-sided error. These filters can claim that a given entry is definitely not represented in a set of entries, or might be represented in the set. That is, negative responses are conclusive, whereas positive responses incur a small false positive probability (FPP). Below is side-by-side simulation of the inner workings of Cuckoo and Bloom filters....
Virtual Worlds as Proxy for Multi-Object Tracking Analysis
Modern computer vision algorithms typically require expensive data acquisition and accurate manual labeling. In this work, we instead leverage the recent progress in computer graphics to generate fully labeled, dynamic, and photo-realistic proxy virtual worlds...
Understanding Bias: A Pre-requisite For Trustworthy Results
It turns out that it’s shockingly easy to do some very reasonable things with data (aggregate, slice, average, etc.), and come out with answers that have 2000% error! In this post, I want to show why that’s the case using some very simple, intuitive pictures...
Jobs
Senior Data Scientist - British Geological Survey - Keyworth, UK The British Geological Survey is one of the world's leading and forward thinking geological science institutes, a vacancy has arisen for a Senior Data Scientist in Keyworth, Nottingham.
Starting salary is £35,222 pa to £38,254+ pa depending on qualifications and experience.
To apply, please go to http://www.topcareer.jobs/home/BritishGeologicalSurvey.aspx and submit your CV and covering letter. Applicants who would like to receive this advert in an alternative format (e.g. large print, Braille, audio or hard copy), or who are unable to apply online should telephone 01793 867003.
Closing date is 24 July 2016.
Training & Resources
Fast Recommendations for Activity Streams Using Vowpal Wabbit
In this post I will explore some techniques that can be used to generate recommendations and predictions using the amazingly fast Vowpal Wabbit library...
Tuning a scikit-learn estimator with skopt
Tuning the hyper-parameters of a machine learning model is often carried out using an exhaustive exploration of (a subset of) the space all hyper-parameter configurations (e.g., using sklearn.model_selection.GridSearchCV), which often results in a very time consuming operation. In this notebook, we illustrate how skopt can be used to tune hyper-parameters using sequential model-based optimisation, hopefully resulting in equivalent or better solutions, but within less evaluations...
How to Construct a Basic Line Chart in D3
Video tutorial where you will use the TSV Data from the D3js.org website Line Chart Example to see how a full D3 Line Chart data visualization is built...
Books
Bayesian Methods for Hackers:
Probabilistic Programming and Bayesian Inference Master Bayesian Inference through Practical Examples and Computation–Without Advanced Mathematical Analysis...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian