[in case you missed it] Data Science Weekly - Issue 385
Issue #385 Apr 08 2021
Editor Picks
How I built a €25K Machine Learning Rig
How to plan, buy, build, and store your 2-10 GPU machine learning servers and PCs...
Rip van Winkle's Razor, a Simple New Estimate for Adaptive Data Analysis
Can you trust a model whose designer had access to the test/holdout set? This implicit question in Dwork et al 2015 launched a new field, adaptive data analysis...This blog post concerns our new paper, which gives meaningful upper bounds on this sort of trouble for popular deep net architectures, whereas prior ideas from adaptive data analysis gave no nontrivial estimates. We call our estimate Rip van Winkle’s Razor which combines references to Occam’s Razor and the mythical person who fell asleep for 20 years...
Deep learning model compression
This post covers model inference optimization or compression in breadth and hopefully depth as of March 2021. This includes engineering topics like model quantization and binarization, more research-oriented topics like knowledge distillation, as well as well-known-hacks...
A Message from this week's Sponsor:
Kickstart Your New Career with a Data Science & Analytics Bootcamp
Bootcamps are starting soon! Don’t miss your chance to join a Data Scientist-led, online Metis bootcamp with career support until you’re hired. Ready to take your data science or analytics career to the next level? Learn more about the Metis Online Data Science & Analytics Bootcamps.
Data Science Articles & Videos
Progressively approaching Kaggle
After mentoring ~ 100 folks in Machine Learning during the last ~ 3yrs, this is my experience of what seems to be working best to get more people onto Kaggle...Kaggle's Tabular Playground Series (TSP) is a great place to start with a progressive plan. Be patient, work hard & keep practising....
Mapping Mangrove Growth and Deforestation with Satellite Imagery
I’ve trained a convolutional neural network on year 2000 satellite image data in order to classify pixels as mangroves or not-mangroves. I’ve applied this model to 2020 data in order to track changes in mangroves over the last 20 years for the Florida coastline...The approach detailed here can be a powerful tool to automate remote monitoring of these extraordinary and special habitats...
Dotplot – the single most useful yet largely neglected dataviz type
I have to confess that the core message of this post is not really a fresh saying. But if I was given a chance to deliver one dataviz advise to every (ha-ha-ha) listening mind, I’d choose this: forget multi-category bar plots and use dotplots instead...
Relighting And Color Grading With Machine Learning
Continuity can be a challenge for [Photo/Video] shoots that are plagued by varying weather conditions, where, for instance, the pick-up shots are in bright sunlight but the core footage was shot under an even layer of cloud... Neural networks have been brought to bear on the problem for a few years now. In 2019 a Google-led academic collaboration presented a novel neural network that implemented a rudimentary process for relighting, though the results were not entirely convincing[2]...In 2020 another collaboration, this time between Amazon, Adobe and the University of Maryland, developed a relighting algorithm capable of working on portraits as large as 1024x1024 – which is pretty HD for the image synthesis space, at least at the moment...
Investigation: Robotic Process Automation (RPA)
RPA bots take control of your computer (mouse and keyboard) and do the boring repetitive tasks that we humans are too often stuck with. They do so through a series of conditions, loops, and commands, coded to control the graphic user interface...In the below Jupyter Notebook, I discuss one of the more popular packages — PyAutoGUI. I walk through setting up PyAutoGUI, popular functions, and a demo. The demo shows a use case where a user needs to download multiple monthly building permit text files from the Census Bureau...
What is Similarity Search?
What does Euclid of Alexandria have to do with Nike shoes? Find out in this intro to similarity search...
How to Create Mathematical Animations like 3Blue1Brown Using Python
Wouldn’t it be cool if you can learn how he created these animations so you can create similar animations to explain some data science concepts to your teammates, managers, or followers?...Luckily, Grant puts together a Python package called manim that enables you to create mathematical animation or pictures using Python. In this article, you will learn how to create mathematical animations like below using manim...
Using Google’s Speech-to-Text API with Python
This post provides steps and python syntax for utilizing the Google Cloud Platform speech transcription service...
Branch Specialization
This article describes branch specialization, one of three larger “structural phenomena” we’ve been able observe in neural networks. (The other two, equivariance and weight banding, have separate dedicated articles.) Branch specialization occurs when neural network layers are split up into branches. The neurons and circuits tend to self-organize, clumping related functions into each branch and forming larger functional units – a kind of “neural network brain region.” We find evidence that these structures implicitly exist in neural networks without branches, and that branches are simply reifying structures that otherwise exist...
A Bot that Bird Watches so You Don’t Have To
If you have ever tried to watch a live nest camera hoping to observe a falcon or other interesting bird, you may have had the experience of opening the live stream and seeing an empty nest. Not sure how long you should wait for the bird to return? In this article, I will describe my project aimed at automating nest monitoring for the Nottingham Trent University Falcon Cam. Using deep learning and automation tools, I designed a method for 24-hour bird detection that serves as an infrastructure upon which a Twitter bot or other notification system can be built to notify users when the bird enters or leaves the nest...
Infrastructure / Tools *
What is Data Observability?
Investing in data observability is becoming increasingly important as companies collect more and more (often third-party) data. Introducing new data sources and expanding access to new data consumers also leads to more complex pipelines—which increases the opportunities for missing, stale, or duplicate data to affect your business. Barr Moses, CEO and co-founder of Monte Carlo, explains how data engineers can leverage best practices from DevOps and engineering to fix data quality at scale.
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Jobs
Lead Data Scientist - Butter - SF / Remote
Butter is the payments stack for the modern subscription economy. We're building an inclusive remote-first culture that enables our employees to contribute from anywhere in the world. We believe that anyone can have an impact so we want you to join our team to modernize subscription payments... You will predict the future to drive value for our customers.
Our customers rely on us to help them recapture lost revenue and drive insights into their payments data. You will help us uncover what we don't know and should be looking at, propose new areas for us to analyze and model. We'll also look to you to suggest new products we should consider offering customers. You'll ensure we have clean and accurate data for both reporting and modeling purposes. You will build models, data pipelines, and business critical reports. You'll visualize the data for consumption both internally and externally (customer-facing)...
Want to post a job here? Email us for details >> team@datascienceweekly.org
Training & Resources
UC Berkeley's Designing, Visualizing and Understanding Deep Neural Networks Class [57 Videos]
Broad overview of deep learning topics: • Neural network architectures • Optimization algorithms • Applications: vision, NLP • Reinforcement learning • Advanced topics...
MLOps
Learn how to apply ML to build a production grade product and deliver value...
Statistics 110: Probability
Statistics 110 (Probability) has been taught at Harvard University by Joe Blitzstein (Professor of the Practice in Statistics, Harvard University) each year since 2006. The on-campus Stat 110 course has grown from 80 students to over 300 students per year in that time. Lecture videos, review materials, and over 250 practice problems with detailed solutions are provided. This course is an introduction to probability as a language and set of tools for understanding statistics, science, risk, and randomness...
Books
Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian