Data Science Weekly - Issue 256
Issue #256 Oct 18 2018
Editor Picks
The Economist's Big Mac Index is calculated with R
Since its inception in 1986, the Big Mac Index has been compiled and calculated manually, twice a year. But starting with the most recent published index, the index is now calculated with R. This is the first example of a new program at The Economist to publish the data and methods behind its journalism, and here the data and code behind the Big Mac Index have been published as a Github repository...
Will Compression Be Machine Learning’s Killer App?
When I talk to people about machine learning on phones and devices I often get asked “What’s the killer application?“. I have a lot of different answers, everything from voice interfaces to entirely new ways of using sensor data, but the one I’m most excited about in the near-team is compression. Despite being fairly well-known in the research community, this seems to surprise a lot of people, so I wanted to share some of my personal thoughts on why I see compression as so promising...
Your next doctor’s appointment might be with an AI
A new wave of chatbots are replacing physicians and providing frontline medical advice—but are they as good as the real thing?...
A Message from this week's Sponsor:
Beginner Data Science Courses - Live & Online
Leading data science training provider, Metis, allows you the flexibility to skill up in data science from anywhere. Learn from industry experts in a unique Live Online format where you’ll interact with instructors and classmates in real time during class. Browse our selection of beginner-level courses and enroll today!
View Courses
Data Science Articles & Videos
M.I.T. Plans College for Artificial Intelligence, Backed by $1 Billion
Every major university is wrestling with how to adapt to the technology wave of artificial intelligence — how to prepare students not only to harness the powerful tools of A.I., but also to thoughtfully weigh its ethical and social implications. A.I. courses, conferences and joint majors have proliferated in the last few years. But the Massachusetts Institute of Technology is taking a particularly ambitious step, creating a new college backed by a planned investment of $1 billion...
Resource-efficient Machine Learning in 2 KB RAM for the Internet of Things
This paper develops a novel tree-based algorithm, called Bonsai, for efficient prediction on IoT devices – such as those based on the Arduino Uno board having an 8 bit ATmega328P microcontroller operating at 16 MHz with no native floating point support, 2 KB RAM and 32 KB read-only flash...
Consistently Beautiful Visualizations with Altair Themes
In this piece we’ll be digging deeper into one of altair's less known features: themes...
Applying Deep Learning to Metastatic Breast Cancer Detection
A pathologist’s microscopic examination of a tumor in patients is considered the gold standard for cancer diagnosis, and has a profound impact on prognosis and treatment decisions. One important but laborious aspect of the pathologic review involves detecting cancer that has spread (metastasized) from the primary site to nearby lymph nodes. Detection of nodal metastasis is relevant for most cancers, and forms one of the foundations of the widely-used TNM cancer staging...
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers...
Neural networks don’t understand what optical illusions are
Machine-vision systems can match humans at recognizing faces and can even create realistic synthetic faces. But researchers have discovered that the same systems cannot recognize optical illusions, which means they also can’t create new ones...
Patient2Vec: A Personalized Interpretable Deep Representation of the Longitudinal Electronic Health Record
The wide implementation of electronic health record (EHR) systems facilitates the collection of large-scale health data from real clinical settings. Despite the significant increase in adoption of EHR systems, this data remains largely unexplored, but presents a rich data source for knowledge discovery from patient health histories in tasks such as understanding disease correlations and predicting health outcomes.In this paper, we propose a computational framework, Patient2Vec, to learn an interpretable deep representation of longitudinal EHR data which is personalized for each patient...
Image-Based eCommerce Product Discovery: A Deep Learning Case Study
To further improve discoverability of Macy’s product catalog online we introduced an easy shopping experience for finding products which are hard to describe using text-based search. In this talk, Macy’s engineers share how they’ve implemented visual similarity using CNNs...
Jobs
Data Scientist - Hearts & Science - NYC
As every CMO knows, employing technology, data and analytics to business challenges is the cost of entry, not a nice-to-have. Unlocking the value of consumer data can deliver growth, revenue and ROI, and create differentiation to stay on top. Real-time action is instrumental in driving results within a highly fragmented marketplace. Hearts & Science’s Marketing Science solutions are built to deliver insightful results with speed, accuracy and a single version of the truth. In leveraging the DNA of a marketing agency with a talent pool of developers, data scientists and Ph.Ds, we hold a unique position in the increasingly crowded ad tech and consulting space...
Training & Resources
How To Define A Convolutional Layer In PyTorch
Learn how to use PyTorch nn.Sequential and PyTorch nn.Conv2d to define a convolutional layer in PyTorch, via a screencast video and full tutorial transcript...
Training Neural Nets on Larger Batches: Practical Tips for 1-GPU, Multi-GPU & Distributed setups
How can you train your model on large batches when your GPU can’t hold more than a few samples? There are several tools, tips and tricks you can use to do that and I thought it would be nice to gather all the things I use and learned in a post...
Discriminator Rejection Sampling
We propose a rejection sampling scheme using the discriminator of a GAN to approximately correct errors in the GAN generator distribution. We show that under quite strict assumptions, this will allow us to recover the data distribution exactly...
Books
Data Visualization with Python and JavaScript:
Scrape, Clean, Explore & Transform Your Data Learn how to turn raw data into rich, interactive web visualizations with the powerful combination of Python and JavaScript. With this hands-on guide, author Kyran Dale teaches you how build a basic dataviz toolchain with best-of-breed Python and JavaScript libraries—including Scrapy, Matplotlib, Pandas, Flask, and D3—for crafting engaging, browser-based visualizations...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S., Want to reach our audience / fellow readers? Consider sponsoring - grab a spot now; first come first served! All the best, Hannah & Sebastian