Data Science Weekly

Jul 12, 2018

Issue #242 July 12 2018

Editor Picks

Great Power, Great Responsibility: The 2018 Big Data & AI Landscape
Overview of the current state of A.I. by Matt Turck (FirstMark Capital). What's changed over the past year, where things are going, and the most important players...

Ways to think about machine learning
We're now four or five years into the current explosion of machine learning, and pretty much everyone has heard of it. It's not just that startups are forming every day or that the big tech platform companies are rebuilding themselves around it - everyone outside tech has read the Economist or BusinessWeek cover story, and many big companies have some projects underway. We know this is a Next Big Thing...

Given a satellite image, machine learning creates the view on the ground
Leonardo da Vinci famously created drawings and paintings that showed a bird’s eye view of certain areas of Italy with a level of detail that was not otherwise possible until the invention of photography and flying machines. Indeed, many critics have wondered how he could have imagined these details. But now researchers are working on the inverse problem: given a satellite image of Earth’s surface, what does that area look like from the ground? How clear can such an artificial image be?...

A Message from this week's Sponsor:

Don't Let the Model Myth Hold You Back

Are package acquisitions and approval processes slowing your ability to develop and deliver innovative models? A lot of your pain can be equated to the Model Myth - the notion that models should be treated like data or other digital assets like software. Models are fundamentally different and require a framework that embraces their differences. Read this paper to understand why models can’t be managed like other digital assets and to learn how to build this new organizational capability that is essential to remaining competitive in a model-driven world.

Read Whitepaper

Data Science Articles & Videos

Troubling Trends in Machine Learning Scholarship
This paper aims to instigate discussion, answering a call for papers from the ICML Machine LearningDebates workshop. While we stand by the points represented here, we do not purport to offer a fullor balanced viewpoint or to discuss the overall quality of science in ML...

How Many Random Seeds Should I Use? Statistical Power Analysis in (Deep) Reinforcement Learning Experiments
How many random seeds are needed to compare #DeepRL algorithms? Our new tutorial to address this key issue of reproducibility in reinforcementlearning...

What Image Classifiers Can Do About Unknown Objects
A few days ago I received a question from Plant Village, a team I’m collaborating with about a problem that’s emerged with a mobile app they’re developing. It detects plant diseases, and is delivering good results when it’s pointed at leaves, but if you point it at a computer keyboard it thinks it’s a damaged crop. This isn’t a surprising result to computer vision researchers, but it is a shock to most other people, so I want to explain why it’s happening, and what we can do about it....

State of AI
In this report, we set out to capture a snapshot of the exponential progress in AI with a focus on developments in the past 12 months. Consider this report as a compilation of the most interesting things we’ve seen that seeks to trigger informed conversation about the state of AI and its implication for the future...

An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution
Few ideas have enjoyed as large an impact on deep learning as convolution. For any problem involving pixels or spatial representations, common intuition holds that convolutional neural networks may be appropriate. In this paper we show a striking counterexample to this intuition via the seemingly trivial coordinate transform problem, which simply requires learning a mapping between coordinates in (x,y) Cartesian space and one-hot pixel space...

Why businesses fail at machine learning
I’d like to let you in on a secret: when people say ‘machine learning’ it sounds like there’s only one discipline here. There are two, and if businesses don’t understand the difference, they can experience a world of trouble...

Text Generation using a RNN
Check out this end-to-end example of generating Shakespeare-like text using tf.keras + eager...

Tracking the Progress in Natural Language Processing
Research in Machine Learning and in Natural Language Processing (NLP) is moving so fast these days, it is hard to keep up...A number of resources exist that could help with this process, but each has deficits...As an alternative, I have created a GitHub repository that keeps track of the datasets and the current state-of-the-art for the most common tasks in NLP. The repository is kept as simple as possible to make maintenance and contribution easy. If I missed your favourite task or dataset or your new state-of-the-art result or if I made any error, you can simply submit a pull request...

Jobs

Junior Data Scientist - Dow Jones - NYC
We are looking for a Junior Analyst to join a specialized data team, focused on growing our subscription business and improving Dow Jones’ core products. The role will require technical expertise in handling large datasets, as well as an obsession with great news products. You will assist in the execution of large scale data projects and support daily reporting on core business functions...

Training & Resources

Use Torchvision CenterCrop Transform To Do A Square Crop Of A PIL Image
Learn how to use Torchvision CenterCrop Transform (torchvision.transforms.CenterCrop) to do a square crop of a PIL image, via a screencast video and full tutorial transcript...

Intro to Keras Layers
In this article, we’ll work through some of the basic principles of deep learning, by discussing the fundamental building blocks in this exciting field. Take a look at some of the primary ingredients of getting started below, and don’t forget to bookmark this page as your Deep Learning cheat sheet!...

Setting up a Spark Cluster on AWS
Our goal for today is to build our own cluster with Spark. Fortunately for us, Amazon has made this pretty simple. We’re going to get started by going to AWS...

Books

Guesstimation: Solving the World's Problems on the Back of a Cocktail Napkin "Guesstimation enables anyone with basic math and science skills to estimate virtually anything--quickly--using plausible assumptions and elementary arithmetic"...

For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.

P.S., Want to reach our audience / fellow readers? Consider sponsoring - grab a spot now; first come first served! All the best, Hannah & Sebastian

Data Science Weekly Newsletter

Data Science Weekly - Issue 242