Data Science Weekly - Issue 643
Curated news, articles and jobs related to Data Science, AI, & Machine Learning
Issue #643
March 19, 2026
Hello!
Once a week, we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.
And now…let’s dive into some interesting links from this week.
Editor's Picks
The Boon of Dimensionality
I recently watched Grant Sanderson’s (3blue1brown) video about the volume of high-dimensional spheres. He made a note that the high-dimensional space is also of peculiar interest to the ML field. I knew of the curse of dimensionality, but I wanted to dig deeper and here is what I found, the other side; the boon of dimensionality….
The Emerging Science of Machine Learning Benchmarks
Why benchmarks advanced machine learning, the crisis they now face, and the science we need to sustain progress…The Best Tacit Knowledge Videos on Every Subject
Tacit knowledge is extremely valuable. Unfortunately, developing tacit knowledge is usually bottlenecked by apprentice-master relationships…This post is a Schelling point for aggregating these videos—aiming to be The Best Textbooks on Every Subject for Tacit Knowledge Videos…Experts in the videos include Stephen Wolfram, Holden Karnofsky, Andy Matuschak, Jonathan Blow, Tyler Cowen, George Hotz, and others…
What’s on your mind
This Week’s Poll:
.
Last Week’s Poll:
.
Data Science Articles & Videos
Ziptable: Share small datasets as a link — no server, no storage, no account
Ziptable lets you share a small CSV or JSON dataset by sending a single link. The person you send it to opens the link and immediately sees the data in their browser, ready to search, inspect, and download again. No attachments, no cloud storage workflow, and no account required…Over the past few months, I have not really written code from scratch, not for production, mostly exploratory work. This makes me question my place on the team. We have a lot of staff and senior staff level data scientists who are older and historically not as strong in Python as I am. But recently, I have seen them produce analyses using Python that they would have needed my help with before AI. This makes me wonder if the ideal candidate in today’s market is someone with strong subject matter expertise, and coding skill just needs to be average rather than exceptional…
Probabilistic model-agnostic survival analysis using scikit-learn, glmnet, xgboost, lightgbm, pytorch, keras, nnetsauce and mlsauce
Survival analysis is a group of Statistical/Machine Learning (ML) methods for predicting the time until an event of interest occurs…In this post, I show how to use scikit-learn, glmnet, xgboost, lightgbm, pytorch, keras, nnetsauce and mlsauce in conjuction with Python package survivalist for probabilistic survival analysis. The probabilistic part is based on conformal prediction and Bayesian inference, and graphics represent the out-of-sample ML survival function vs Empirical Kaplan-Meier survival function (with confidence intervals)…An interactive presentation about the Grammar of Graphic
The Grammar of Graphics: A theoretical framework created by Leland Wilkinson that provides a structured way to describe and build data visualizations by breaking them down into semantic layers…Has AI Coding Made DQ Better Or Worse? A Data Driven Exploration
I analyzed 1,000 troubleshooting investigations from the past month across hundreds of data environments. Using an LLM-assisted clustering approach combined with manual review, I categorized root causes into several classes, including data source issues, system failures, and code changes. From that analysis, I determined the percentage of data quality issues resulting from code-based issues, as well as the types of code-based issues still occurring…Data Organization in Spreadsheets
Spreadsheets are widely used software tools for data entry, storage, analysis, and visualization. Focusing on the data entry and storage aspects, this article offers practical recommendations for organizing spreadsheet data to reduce errors and ease later analyses…London’s Divide Was Called Character. It Was Actually Policy.
I built a machine learning model to find London’s divide, and you can enter your postcode to see which side you’re on. We’ve been blaming the wrong people for it…A Decade of Slug
What is now known as the Slug Algorithm for rendering fonts directly from Bézier curves on the GPU was developed in the Fall of 2016, so this year marks a full decade since its inception…Since then, Slug has been licensed widely in the video games industry as well as by an array of companies specializing in areas like scientific visualization, CAD, video editing, medical equipment, and even planetariums…This post talks about what has changed within the rendering method since 2017, when the paper was published and the Slug Library was first released. It then concludes with an exciting announcement for those who may want to implement the Slug algorithm for their own projects…Is AutoML Dead? Or is it just resting?
“LLMs can write code, so you can fire your data scientists.” Oh, wait…AutoML hype peaked in 2019. On May 28, 2019, Forrester published The Forrester New Wave: Automation-Focused Machine Learning Solutions. The report covered DataRobot, H2O.ai, and a bunch of no-hopers, including Aible, Bell Integrator, Big Squid, DMway, dotData, EdgeVerve, and Squark. Where are they now? Seven years after Forrester evaluated “automation-focused machine learning solutions”, where are those companies now?…
New course on generative AI for behavioral science
This winter, at Northwestern, I taught a grad seminar on Generative AI for Social Science. The goal was to survey emerging applications of generative AI (mostly language model agents) in the social sciences, with special attention to methodological and metascientific concerns that come up when AI is used to simulate or substitute for human observations or labels…Transformers as Constrained Optimization
Rewriting a pre-norm decoder-only transformer as a mixed-geometry constrained splitting scheme: RMSNorm as radial gauge fixing, attention as an entropy- or KL-constrained simplex solve, and residual branches as Euclidean trust-region steps…
How to take the next step? [Reddit]
Going on 1YOE as a data scientist at a small consulting company. Have a STEM degree but no masters. Current role is as a contractor, so around full time work, but I am looking to transition into something more stable. Is making the jump to a bigger companies DS team possible without a masters?Atmospheric Simulation in R
One of the hardest details to get right in 3D data visualization is lighting: since 2D screens don’t offer the benefit of stereoscopic vision, we have to rely on subtle cues like occlusion, relative motion (if animated), and shading to be able to interpret the space as 3D…
.
Last Week's Newsletter's 3 Most Clicked Links
.
* Based on unique clicks.
** Please take a look at last week's issue #642 here.
Cutting Room Floor
.
Whenever you're ready, 2 ways we can help:
Looking to get a job? Check out our “Get A Data Science Job” Course
It is a comprehensive course that teaches you everything you need to know about getting a data science job, based on answers to thousands of reader emails like yours. The course has three sections: Section 1 covers how to get started, Section 2 covers how to assemble a portfolio to showcase your experience (even if you don’t have any), and Section 3 covers how to write your resume.Promote yourself/organization to ~68,750 subscribers by sponsoring this newsletter. 30-35% weekly open rate.
Thank you for joining us this week! :)
Stay Data Science-y!
All our best,
Hannah & Sebastian


