[in case you missed it] Data Science Weekly - Issue 379

Feb 28, 2021

Issue #379 Feb 25 2021

Editor Picks

I wrote a book about using data science to solve “everyday” problems
I've always wanted to write a book. I have helped write 3 different deeply technical books (and one solutions manual), but I wanted something fun, interesting, and valuable...So I wrote "Everyday Data Science" which is a collection of stories, tutorials, jokes, math, and code all written to inspire people to analyze their personal data...In general, I was also inspired by the challenge to "make $100 online" which I have done in the past month since launching. It was daunting, and I felt quite vulnerable, but overall I'm pleased with what I've made...I wrote up this quick post to give you an idea of the process I followed to write the book, and some of the content...

Python Concurrency: The Tricky Bits
As a data scientist who is spending more time on software engineering, I was recently forced to confront an ugly gap in my knowledge of Python: concurrency. To be honest, I never completely understood how the terms async, threads, pools and coroutines were different and how these mechanisms could work together. Every time I tried to learn about the subject, the examples were a bit too abstract for me, and I hard time internalizing how everything worked...This changed when a friend of mine recommended a live coding talk by David Beazley, an accomplished Python educator...This blog post documents what I learned along the way so others can benefit, too...

Mike Bostock (Creater of D3.js): 10 Years of Open-Source Visualization...Did I learn anything from D3.js? Let’s see...
In honor of D3 1.0’s ten anniversary, I thought I’d reflect on lessons learned. This isn’t intended to be too comprehensive or serious — just a handful of observations as I look ahead to the next ten years. But I hope a nugget or two will interest you, too...

A Message from this week's Sponsor:

Feature store: The data platform for building, deploying, and using ML features

The Uber Michelangelo team built the first feature store to scale Uber’s Machine Learning to 1000s of production models in just a few years. Feature stores have now become an essential part of the modern stack for operational ML. They bring DevOps principles to ML data, and allow data scientists to build great ML features, get them to production instantly, and share them across teams. Mike Del Balso, Co-Founder of Tecton, and Willem Pienaar, creator of Feast, teamed up to provide a joint definition of feature stores and how they can solve the data problem for ML.

Data Science Articles & Videos

Workloads of Counting Queries: Enabling Rich Statistical Analyses with Differential Privacy
In this post, we will look at answering a collection of counting queries—which we call a workload—under differential privacy. This has been the subject of considerable research effort because it captures several interesting and important statistical tasks. By analyzing the specific workload queries carefully, we can design very effective mechanisms for this task that achieve low error...

First return, then explore
We introduce Go-Explore, a family of algorithms that addresses these two challenges directly through the simple principles of explicitly ‘remembering’ promising states and returning to such states before intentionally exploring. Go-Explore solves all previously unsolved Atari games and surpasses the state of the art on all hard-exploration games1, with orders-of-magnitude improvements on the grand challenges of Montezuma’s Revenge and Pitfall. We also demonstrate the practical potential of Go-Explore on a sparse-reward pick-and-place robotics task...

Towards Simple, Interpretable, and Trustworthy AI
In this episode of the Data Exchange [Podcast] I speak with Sheldon Fernandez, CEO at Darwin AI, and Alex Wong, Professor at the University of Waterloo, Co-Founder of DarwinAI (Chief Scientist) and Euclid Labs about building tools to help companies operationalize machine learning and AI...

Fast Inverse Square Root — A Quake III Algorithm [Video]
In this video we will take an in depth look at the fast inverse square root and see where the mysterious number 0x5f3759df comes from. This algorithm became famous after id Software open sourced the engine for Quake III. On the way we will also learn about floating point numbers and newton's method...

The Technology Behind Cinematic Photos
Looking at photos from the past can help people relive some of their most treasured moments. Last December we launched Cinematic photos, a new feature in Google Photos that aims to recapture the sense of immersion felt the moment a photo was taken, simulating camera motion and parallax by inferring 3D representations in an image. In this post, we take a look at the technology behind this process, and demonstrate how Cinematic photos can turn a single 2D photo from the past into a more immersive 3D animation...

Recent Advances in Language Model Fine-tuning
This article provides an overview of recent methods to fine-tune large pre-trained language models...

Launching the Facebook Map
For the past year and a half, it’s been our privilege to work on one of our largest and most ambitious undertakings ever: collaborating closely with a team of Facebook engineers, designers, and data experts to roll out a global, multi-scale base map for all of Facebook’s billions of users. In late 2020, this map went live, and we’re extremely proud of the results...Here's how we did it, what we did, and why...

A Data Pipeline is a Materialized View
Materialized views never saw widespread adoption as a primary tool for building data pipelines, likely due to their limitations and ties to relational database technologies. Perhaps with this new wave of tools like dbt and Materialize we’ll see materialized views used more heavily as a primary building block in the typical data pipeline...Regardless of whether we see that kind of broad change, materialized views are still a useful design tool for conceptualizing what we are doing when we build data pipelines...

Autonomous navigation of stratospheric balloons using Reinforcement Learning [Video]
Marlos C. Machado: Efficiently navigating a superpressure balloon in the stratosphere requires the integration of a multitude of cues, such as wind speed and solar elevation, and the process is complicated by forecast errors and sparse wind measurements. Coupled with the need to make decisions in real time, these factors rule out the use of conventional control techniques. This talk describes the use of reinforcement learning to create a high-performing flight controller for Loon superpressure balloons. Our algorithm uses data augmentation and a self-correcting design to overcome the key technical challenge of reinforcement learning from imperfect data, which has proved to be a major obstacle to its application to physical systems...

The Future of PyMC3, or: Theano is Dead, Long Live Theano
TL;DR: PyMC3 on Theano with the new JAX backend is the future, PyMC4 based on TensorFlow Probability will not be developed further...With the ability to compile Theano graphs to JAX and the availability of JAX-based MCMC samplers, we are at the cusp of a major transformation of PyMC3. Without any changes to the PyMC3 code base, we can switch our backend to JAX and use external JAX-based samplers for lightning-fast sampling of small-to-huge models...

Tools*

Score 200 Free Cores with Coiled Cloud Before 3/12

Coiled, the company providing scalable data science and machine learning with Dask, turned 1 year old this month - and they want to give you a gift to celebrate.

They’re building Coiled Cloud, which provides hosted Dask clusters, docker-less managed software, and zero-click deployments. Through March 12th, all Coiled Cloud users get 200 free cores.

Burst to the cloud with your data science and ML workflows. Help Coiled burn their cloud startup credits. No credit card required.

Come to the Dask side! Sign up with Coiled today and get your free cores here.

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

Jobs

Senior Revenue Data Scientist - Mozilla - Remote

Come join a company with open-source at its heart! Mozilla is wholly owned by a non-profit and strives to build products that keep the Internet open, accessible, and secure for everyone. You’ll be part of our Data Org where you’ll join a talented team of Data Scientists and Data Engineers. We have a mature data pipeline that processes terabytes of data per day.

We’re looking for a Senior Data Scientist to join our Revenue Data Science team. You’ll work with a cross-functional team to understand and strengthen Mozilla’s financial health. You’ll have a chance to collaborate with folks from across the company and have a visible impact on our success....

Want to post a job here? Email us for details >> team@datascienceweekly.org

Training & Resources

10 new lectures on Reinforcement Learning with lots of examples [Video]
This is based on David Silver's course but targeting younger students within a shorter 50min format (missing the advanced derivations) + more examples and Colab code...

Theoretical Foundations of Graph Neural Networks [Video]
Deriving graph neural networks (GNNs) from first principles, motivating their use, and explaining how they have emerged along several related research lines...

Prediction Intervals for Gradient Boosting Regression
This example shows how quantile regression can be used to create prediction intervals...

Books

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

Data Science Weekly Newsletter