[in case you missed it] Data Science Weekly - Issue 370

Dec 27, 2020

Issue #370 Dec 24 2020

Editor Picks

NeRF Explosion 2020
Besides the COVID-19 pandemic and political upheaval in the US, 2020 was also the year in which neural volume rendering exploded onto the scene, triggered by the impressive NeRF paper by Mildenhall et al. This blog post is my way of getting up to speed in a fascinating and very young field and share my journey with you...

Taking Questions from the Late Justice Ginsburg:
Fine-Tuning Billion+ Parameter Transformers Using Model Parallelism
We’ll never know what Justice Ginsburg might have asked had she completed the current term of the Court. However in memory of the Justice, we can use her words from decades of oral arguments to get a sense of some of the questions she might have posed. In this example, we will create a persona-based dialogue model of Justice Ginsburg using the largest versions of t5 and gpt2, fine-tuning on models of 1.5 billion and 11 billion parameters respectively in just a few hours using model parallelism...

2020: A Year Full of Amazing AI papers- A Review
A curated list of the latest breakthroughs in AI by release date with a clear video explanation, link to a more in-depth article, and code...

A Message from this week's Sponsor:

Data scientists are in demand on Vettery

Vettery is an online platform that connects you with thousands of actively hiring startups and Fortune 500 companies. Create a free profile, name your salary, and get discovered by hiring managers looking to grow their teams.

Get started - it’s completely free for job-seekers!

Data Science Articles & Videos

Mastering Atari, Go, chess and shogi by planning with a learned model
In 2016, AlphaGo was introduced. Two years later, its successor - AlphaZero - showed significant progress in Go, chess and shogi. Today in Nature, our team describes MuZero, a significant step forward in the pursuit of general-purpose algorithms...

DeepMind's AI agent MuZero could turbocharge YouTube
DeepMind's latest AI program can attain "superhuman performance" in tasks without needing to be given the rules. But unlike its predecessors, it had to work out their rules for itself. It is already being put to practical use to find a new way to encode videos, which could slash YouTube's costs...

Homemade Machine Learning
The purpose of this repository is not to implement machine learning algorithms by using 3rd party library one-liners but rather to practice implementing these algorithms from scratch and get better understanding of the mathematics behind each algorithm. That's why all algorithms implementations are called "homemade" and not intended to be used for production...

Reengineering Facebook AI’s deep learning platforms for interoperability
Having a single set of interchangeable building blocks across different AI subfields will help accelerate progress. We’re reengineering our platforms, using Hydra for configs, and offering greater integration with PyTorch Lightning...

Hypersim:
A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding
Photorealistic synthetic scenes have the advantage of giving us as many ground truth layers as we want to train an ML system. But is it enough for sim2real?...

Optimization is as hard as approximation
You asked for an efficient algorithm for non-convex optimization for Christmas? It won’t be possible unless you have a lot of smoothness. See why in this months blog post....

Demo: Doing Data Science on Remote Data
Andrew Trask and Mat Leonard will demonstrate how to analyze data on someone else’s machine using PySyft and Duet...

AlphaFold 2 & Equivariance
A few weeks ago, in the latest CASP competition for protein structure prediction (CASP14), DeepMind’s AlphaFold 21 outperformed all its competitors with an unprecedented margin. In this blog post, we aim to shed light on one of the important building blocks that distinguishes AlphaFold 2 from the other approaches and likely contributed to their success: an equivariant structure prediction module...

Extracting Training Data from Large Language Models
It has become common to publish large (billion parameter) language models that have been trained on private datasets. This paper demonstrates that in such settings, an adversary can perform a training data extraction attack to recover individual training examples by querying the language model. We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model's training data...

Training*

Quick Question For You: Do you want a Data Science job?

After helping hundred of readers like you get Data Science jobs, we've distilled all the real-world-tested advice into a self-directed course.

The course is broken down into three guides:

Data Science Getting Started Guide. This guide shows you how to figure out the knowledge gaps that MUST be closed in order for you to become a data scientist quickly and effectively (as well as the ones you can ignore)
Data Science Project Portfolio Guide. This guide teaches you how to start, structure, and develop your data science portfolio with the right goals and direction so that you are a hiring manager's dream candidate
Data Science Resume Guide. This guide shows how to make your resume promote your best parts, what to leave out, how to tailor it to each job you want, as well as how to make your cover letter so good it can't be ignored!

Click here to learn more ...

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

Jobs

Data Scientist - Apple Pay Analytics - NYC

You will play a key role improving the Apple Pay product experience. As a member of the analytics team you will be supporting a product function. You will partner with business owners, understand goals, craft KPIs and measure ongoing performance. You will initially engage with the product and engineering teams in ensuring that we have the appropriate instrumentation in place to deliver on these metrics. You will subsequently use advanced statistical, ML and analytical techniques to analyze product performance and identify key insights that inform product improvements and business strategy. The role requires a high degree of independence, ownership and collaboration working cross functionally across all levels of a highly matrixed organization...

Want to post a job here? Email us for details >> team@datascienceweekly.org

Training & Resources

Docker for data scientists — Part 1
The basics. A quick guide for data scientists and machine learning engineers to get started with Docker...

Voice Separation with an Unknown Number of Multiple Speakers
We provide a PyTorch implementation of the paper: Voice Separation with an Unknown Number of Multiple Speakers In which, we present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously...

NumPy Illustrated: The Visual Guide to NumPy
Brush up your NumPy or learn it from scratch...

Books

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

Data Science Weekly Newsletter

Discussion about this post