Data Science Weekly - Issue 626
Curated news, articles and jobs related to Data Science, AI, & Machine Learning
Issue #626
November 20, 2025
Hello!
Once a week, we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.
And now…let's dive into some interesting links from this week.
Editor's Picks
Why I’m Making the Switch to marimo Notebooks
After years of using Jupyter Lab, I have moved most of my work to marimo notebooks, a new kind of Python notebook that addresses many long-standing issues with traditional ones. This article covers the reasons behind my transition and how marimo fits naturally into my current workflow, with full gratitude to Project Jupyter for building the notebook ecosystem that shaped data science, research and education…
Hannes Mühleisen - Data Architecture Turned Upside Down
Every data architecture diagram out there makes it abundantly clear who’s in charge: At the bottom sits the analyst, above that is an API server, and on the very top sits the mighty data warehouse. This pattern is so ingrained we never ever question its necessity, despite its various issues like slow data response time, multi-level scaling issues, and massive cost…In this talk, it will be discussed how modern data engineering paradigms like decomposition of storage, single-node query processing, and lakehouse formats enable a radical departure from the tired three-tier architecture. By inverting the architecture we can put user’s needs first. We can rely on commoditised components like object store to enable fast, scalable, and cost-effective solutions…Make Things, Tell People
I got my first job post-graduate school via a board game weekend1. Not a tech conference, not a networking event. I mentioned the research I was doing to another graduate student who matched me with someone looking to hire for very similar work…Years later, with a lot more experience, I’m still currently not finding work through traditional application processes….
.
What’s on your mind
This Week’s Poll:
.
Last Week’s Poll:
.
Data Science Articles & Videos
Data visualization exercise repositories
This contains links to all the standalone repositories I use for the third generation of assignments for my dataviz class, starting in Fall 2025. Each is in its own repository so it’s easier to (1) make copies in the future and (2) deploy these projects to Posit.cloud without needing to upload a bunch of individual files or move them around after uploading a .zip there…Where to Go After Data Science: Unconventional / Weird Exits? [Reddit]
Data science careers often feel like they funnel into the same few paths—FAANG, ML/AI engineering, or analytics leadership—but people actually branch into wildly unexpected directions. I’m curious about those off-the-beaten-path exits: roles in unexpected industries, analytics-adjacent pivots, international moves, or entirely new ventures. Would love to hear some stories…
TinyETL - Fast, zero-config ETL in a single binary
Transform and move data between any format or database instantly. No dependencies, no config files, just one command…Why TinyETL?Single 12.5MB binary — no dependencies, no installation headaches
180k+ rows/sec streaming — handles massive datasets efficiently
Zero configuration — automatic schema detection and table creation (override with schema and config files in yaml)
Lua transformations — powerful data transformations
Universal connectivity — CSV, JSON, Parquet, Avro, MySQL, PostgreSQL, SQLite, DuckDB, MSSQL, ODBC. Coming soon: Snowflake, Databricks, OneLake
Cross-platform — Linux, macOS, Windows ready…
Neuroevolution: Harnessing Creativity in AI Agent Design
Neuroevolution, or optimization of neural networks through evolutionary computation, has been a growing subarea of machine learning since the 1990s. Its primary focus is on evolving neural networks for intelligent agents when the training targets are not known, and good performance requires many decisions over time, such as robotic control, game playing, and decision-making. More recently it has also been extended to optimizing deep-learning architectures, understanding how biological intelligence evolved, and optimizing neural networks for hardware implementation. This book introduces students to the basics of neuroevolution, progresses to several advanced topics that make neuroevolution more effective and more general, reviews example application areas, and proposes further research questions…RegreSQL: Regression Testing for PostgreSQL Queries
RegreSQL brings PostgreSQL’s regression testing methodology to your application queries, catching both correctness bugs and performance regressions before production…Optimising VS Code and Positron for Quarto: Essential Settings for Better Editing
Discover the custom settings I use in VS Code and Positron to enhance my Quarto document editing workflow, from improved Git diffs to better visual guides for nested divs…Google Research: DS-STAR: A state-of-the-art versatile data science agent
DS-STAR is a state-of-the-art data science agent whose versatility is shown by its ability to automate a range of tasks — from statistical analysis to visualization and data wrangling — across various data types, culminating in a top-ranking performance on the famous DABStep benchmark…I’ve been thinking about Agents and MCP all wrong
For the last cough 20 cough years I’ve built data processing pipelines, either for real or as examples based on my previous experience. It’s the same pattern, always: Data comes in, Data gets processed, Data goes out…Perhaps I’m too literal, perhaps I’m cynical after too many years of vendor hype, or perhaps it’s just how my brain is wired—but I like concrete, tangible, real examples of something. So when it comes to agents, particularly with where we’re at in the current hype-cycle, I really wanted to have some actual examples on which to build my understanding. In addition, I wanted to build some of my own. But where to start?…gggenomes - A grammar of graphics for comparative genomics
gggenomes is a versatile graphics package for comparative genomics. It extends the popular R visualization package ggplot2 by adding dedicated plot functions for genes, syntenic regions, etc. and verbs to manipulate the plot to, for example, quickly zoom in into gene neighborhoods…
The Mighty Simplex
Indeed, simplices, or simplexes, arise in a wide range of geometrical problems and real-world applications. For instance, metallic alloys are described on a simplex to identify the constituent elements…Zero-sum games in game theory and ecosystems in population dynamics are described on simplexes, and the Dantzig simplex algorithm is a central algorithm for optimization in linear programming. Simplexes also are used in nonlinear minimization (amoeba algorithm), in classification problems in machine learning, and they also raise their heads in quantum gravity. These applications reflect the special status of the simplex in the geometry of high dimensions. … It’s Simplexes all the way down!…In this post I discuss some things to consider if you are choosing which of these two excellent packages—that’s important: they are both excellent packages—to use in various situations…
Explain like I’m 5: What are “data products” and “data contracts” [Reddit]
I’ve been seeing mention of “data products” and “data contracts” for some time. I think I get the concepts, but... 🤷♂️ How far off am I?Data product: Something valuable using data? Tangible? Physical? What’s “physical” when we’re talking about virtual, digital things? Is it a dataset/model, report, or something more? Is this just a different word for “solution”? Is it just the terminology for those things nowadays?
Data contract: This is some kind of agreement that data producer/provider doesn’t change a data structure/schema without due process involving the data consumer? Do people actually do this, to good effect? I deal with source data where the vendor changes shit willy-nilly. And other sources where business users can create the dreaded custom field. Maybe I’m cynical, but I can’t see these parties changing those practices readily…
SQL CASE FILES Bureau Database Analysis Division
Welcome to the Bureau’s analytical division. Your expertise in data investigation is required for several high-priority cases…
.
Last Week's Newsletter's 3 Most Clicked Links
Python is not a great language for data science. Part 1: The experience
I’m a co-founder hiring ML engineers and I’m confused about what candidates think our job requires
Learnings after 4 years working with +50 companies on data engineering projects
.
* Based on unique clicks.
** Please take a look at last week's issue #625 here.
Cutting Room Floor
The Rise of “Mindless” TV: Quantifying a New Way of (Kinda) Watching Television
Pyrefly - A fast type checker and language server for Python with powerful IDE features
How to Decide Between Regression and Time Series Models for “Forecasting”? [Reddit]
DSPy Events - Connect with the DSPy community at events around the world
OpenCodePapers: Collecting benchmark results and code links of research papers
When is a result statistically significant but still useless? [Reddit]
.
Whenever you're ready, 2 ways we can help:
Looking to get a job? Check out our “Get A Data Science Job” Course
It is a comprehensive course that teaches you everything you need to know about getting a data science job, based on answers to thousands of reader emails like yours. The course has three sections: Section 1 covers how to get started, Section 2 covers how to assemble a portfolio to showcase your experience (even if you don’t have any), and Section 3 covers how to write your resume.Promote yourself/organization to ~68,750 subscribers by sponsoring this newsletter. 30-35% weekly open rate.
Thank you for joining us this week! :)
Stay Data Science-y!
All our best,
Hannah & Sebastian



Thaks for putting this together each week. Always usefull to stay curent with whats happening in the field.