Data Science Weekly - Issue 151
Issue #151 Oct 13 2016
Editor Picks
What can people do better than machines? The view from 1951
What can humans do that machines can’t? The pessimistic view, in a world of advancing AI and robotics, would be “less and less every day”. While researching this today I ran into an interesting historical perspective — the view from 1951...It’s a paper called “Human Engineering for an Effective Air Navigation and Traffic Control System,” written for the National Security Council by Paul Fitts, a psychologist known for studying human factors in technology...
Open Data - Open for who?
I bracketed September by attending two open data conferences: one for scientists working with satellites, and one for librarians and archivists. Sitting in the audience at both events, it occurred to me over and over again that we’re not all talking about the same thing when we say ‘open data’...
Sharing Your Side Projects Online
(And Making Your Github The Best Résumé It Can Be)
For years, I've written one-off scripts and small programs to automate personal tasks and satisfy my curiosity. Until recently, I was never comfortable sharing this code online. In this talk, I will share good practices I've learned and developed for sharing my small projects online...The talk will include tips on writing reusable scripts, the basics of Git and Github, the importance of READMEs and software licenses, and creation of reproducible Python environments with Conda...Besides making your code more usable and accessible to others, the tips in this talk will help you make your Github profile a valuable component of your online résumé and open the door for others to improve your programs through Github pull requests...
A Message from this week's Sponsor:
O'Reilly Live Training : Real-time. Real experts. Real learning.
Master critical data technology through intensive, hands-on training led by instructors from O’Reilly’s network of expert practitioners. Learn through a combination of lectures, coding exercises, and Q&A — online, in real time.
Managing Enterprise Data Strategies with Hadoop, Spark, & Kafka
Oct 18-19 Learn more >
Professional Data Engineering with Hadoop and Spark
Nov 8, 9 & 10 Learn more >
Data Science Articles & Videos
The Commoditization of Machine Learning
In the machine learning world, we're moving away from the TCP/IP days into the Ruby on Rails days. With limited ML background, it's now much easier to build ML applications than it was even a few years ago. With the rapid development of new open source toolkits, we're truly seeing a rapid commoditization of the technology...
What You Need To Know About Data Augmentation For Machine Learning
There are many approaches to augmenting data. The simplest approaches include adding noise and applying transformations on existing data. Imputation and dimensional reduction can be used to add samples in sparse areas of the dataset. More advanced approaches include simulation of data based on dynamic systems or evolutionary systems. In this post we’ll focus on the two simplest approaches: adding noise and applying transformations...
Deep Reinforcement Learning (DDPG) for 3D Car Racing Simulation TORCS
In this project we will demonstrate how to use the Deep Deterministic Policy Gradient algorithm (DDPG) with Keras together to play TORCS (The Open Racing Car Simulator), a very interesting AI racing game and research platform...
The Data of the Chicago Marathon -
Lessons learned from 400,000 Chicago Marathoners (2005–2016)
All in all I have collected, cleaned, and compared more than 416,000 finisher records and in what follows I will try to summarise some of the more interesting findings...
The Problem With p-values
Academic psychology and medical testing are both dogged by unreliability. The reason is clear: we got probability wrong...
Data Mining Hacker News: Front vs Back
Analysis of HN content typically only focuses on what made the front page and when. Rarely do you see an analysis of what did not make the front page. Given the exact same piece of content (news, blog post, software project, etc.), why does an early submission never reach the front page while a later resubmission does?...
Predict the value of your house using Azure Machine Learning
In this video, Raja walks us through the steps of building a machine learning model using the Azure Machine Learning Studio, to predict the real estate sales price of a house based upon various historical features about the house and the sales transaction. You can access the model used in this conversation right from the Cortana Intelligence Gallery...
Predicting Student Demand at Lingo Live
As part of the TechTeam at Lingo Live one of my responsibilities is to provide accurate forecasts on our ability to handle future students. Do we have capacity to handle an influx of new students? If so, how many more? If not, how many more teachers do we need to hire?...
Jobs
Data Scientist - WebMD - New York, NY Medscape, a division of WebMD, develops and hosts physician portals and related mobile applications that make it easier for physicians and healthcare professionals to access clinical reference sources, stay abreast of the latest clinical information, learn about new treatment options, earn continuing medical education credits and communicate with peers.
The Data Scientist, Outcomes Research for Medscape Education will determine plans for research and execute statistical analyses to demonstrate the value of Medscape Education programs to funders and the education enterprise. The ideal candidate has a high degree of technological, statistical, and analytical skills, and is collaborative, creative, and curious and has the ability to recognize and solve problems independently...
Training & Resources
BinaryTree: Python Library for Learning Binary Trees
BinaryTree is a minimal Python library which provides you with a simple API to generate, visualize and inspect binary trees so you can skip the tedious work of mocking up test trees, and dive right into practising your algorithms! Heaps and BSTs (binary search trees) are also supported...
pandasql: Make python speak SQL
This post is about pandasql, a Python package we (Yhat) wrote that emulates the R package sqldf. It's a small but mighty library comprised of just 358 lines of code. The idea of pandasql is to make Python speak SQL. For those of you who come from a SQL-first background or still "think in SQL", pandasql is a nice way to take advantage of the strengths of both languages...
Julia 0.5 Release Highlights
Julia 0.5 is a pivotal release. It introduces more transformative features than any release since the first official version. Moreover, several of these features set the stage for even more to come in the lead up to Julia 1.0. In this post, we’ll go through some of the major changes in 0.5, including improvements to functional programming, comprehensions, generators, arrays, strings, and more...
Books
The Second Machine Age:
Work, Progress, and Prosperity in a Time of Brilliant Technologies In The Second Machine Age MIT’s Erik Brynjolfsson and Andrew McAfee―two thinkers at the forefront of their field―reveal the forces driving the reinvention of our lives and our economy...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian