Data Science Weekly - Issue 623

Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Oct 30, 2025

Issue #623
October 30, 2025

Hello!

Once a week, we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.

And now…let's dive into some interesting links from this week.

Editor's Picks

Teaching Computers to Read: Rachel Wagner-Kaiser Interview
Teaching Computers to Read is a book focused on best practices and real-life challenges of building NLP and AI solutions for businesses…When I started this journey, my colleague and I were a bit struck by the dichotomy of AI NLP Books: either very high-level AI books targeted towards business leaders that really didn’t incorporate any technical details, or books that were very into the nitty-gritty technical details but lacked a lot of practical, business-driven advice. We wanted to create something that bridged the two: practical, technical advice about building useful AI solutions for business problems…

3Blue1Brown - But what is a Laplace Transform?
Visualizing the most important tool for differential equations…
The Majority Of Your Users
The majority of your users don’t read your changelog. The majority of your users only upgrade to new versions when forced to. The majority of your users don’t know which version scheme your project uses. The majority of your users only read the documentation pages relevant to what they are trying to get done….

What’s on your mind

This Week’s Poll:

Last Week’s Poll:

Featured Message

“Teaching Computers to Read”: New book for practical AI and NLP solutions

Successful AI solutions aren’t about chasing the newest model - it’s about solving the right problems in the right way. “Teaching Computers to Read” (out November 5 from CRC Press) focuses on what technical teams need to design, develop, deploy, and maintain useful NLP and AI solutions. Drawing on real-world experience and examples, the book offers actionable best practices to deliver adaptable, reliable AI systems that address business challenges with lasting, tangible value. Check out the code companion for hands-on practice! Learn more or check out the book on Amazon.

* Want to be featured in the newsletter? Email us for details --> team@datascienceweekly.org

Data Science Articles & Videos

Useful engineering management artifacts
When managing a growing organization, it can be useful to have certain document templates on hand. Here’s a collection of documents I find useful…
Thoughts Regarding Levelling Up as a Data Scientists [Reddit]
As I look for new opportunities , I see there is one or two skills I dont have from the job requirements. I am pretty sure I am not the only one such a situation. How is everyone dealing with these kind of things ? Are you performing side projects to showcase you can pull that off or are you blindly honest about it, claiming that you can pick that up on the job?…
How We Did This Curiosity-Driven Research - An Open-Notebook Exploration of Emergent Grounding in LMs
A few days ago, we released a preprint¹ showing evidence that symbol grounding can emerge in language models….The dust of release has not yet settled, and already a friend, after reading the manuscript, messaged me (Martin):
“I wouldn’t have thought to link these things. What made you do it?”
That question stayed with us, as it exposed a quiet truth about research: we too often present it as a polished surface or a linear arc from premise to conclusion, when in practice it unfolds as a cartography of detours. What looks, in retrospect, like a straight line was, in fact, a constellation of hunches, failed starts, fragile insights, and long hours spent staring at phenomena no one else thought worth staring at…
Factors associated with: problems of using exploratory multivariable regression to identify causal risk factors
Many medical and epidemiological studies use multivariable regression to test whether several independent variables (exposures) are causal determinants of a health outcome. Where mutually adjusted regression coefficients are significant, the exposures are labelled as risk factors for the outcome. We call this study design “factors associated with.” In this article, we argue that this method is flawed due to a lack of reasoning about which variables are treated as confounders, multiple statistical testing, and post hoc interpretation of the results. In some cases, researchers use algorithmic or stepwise approaches to select exposure variables, which further exacerbates these problems…
Memory consumption, dataset size and performance: how does it all relate?
In this post we will talk about dataset size and how it influences the performance of your program. We will also give a few common-sense way to improve your program’s speed by decreasing the dataset size. Finally, we will give experiments to quantify the speed improvements. But before we begin, a few definitions…
AI scientists history
AI Scientist as a term has been around for about 20 years. Recently, there are now papers and products described as AI Scientists. Here, I summarize some of those, the history of the term, and give some ideas on what might define an AI Scientist…
Of course, someone has to write imperative code to build reproducible data science pipelines. It doesn’t have to be you.
The goal is to build a data science pipeline. The example here is purely illustrative, and compare a Nix-based approach to a non Nix-based approach. So, I built the same polyglot Real Business Cycle model pipeline twice. First, I did it without {rixpress} (nor {rix}), using a combination of Docker, Make, and a bunch of wrapper scripts. Then, I did it with {rix} and {rixpress}.
Both pipelines produce the exact same result. But the way to get there is fundamentally different…
The bug that taught me more about PyTorch than years of using it
A loss plateau that looked like my mistake turned out to be a PyTorch bug. tracking it down meant peeling back every layer of abstraction, from optimizer internals to GPU kernels…
People See Text, But LLM Not
“Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are.” — Davis, Matt (2012)
You can probably read that sentence effortlessly. Despite the chaotic order of letters, your brain automatically reconstructs the intended words — because humans don’t read text letter by letter. We perceive shapes, patterns, and visual words…
The Principles of Diffusion Models
This monograph presents the core principles that have guided the development of diffusion models, tracing their origins and showing how diverse formulations arise from shared mathematical ideas. Diffusion modeling starts by defining a forward process that gradually corrupts data into noise, linking the data distribution to a simple prior through a continuum of intermediate distributions. The goal is to learn a reverse process that transforms noise back into data while recovering the same intermediates. We describe three complementary views…
Citizens’ smartphones unravel earthquake shaking in urban areas
In this work, we use statistical spatial modelling to show that smartphone measurements collected by the Earthquake Network citizen science initiative allow us to map site amplification at high spatial resolution, generate high-resolution ShakeMaps, and improve existing ground motion models. We apply the method to the red zone of Campi Flegrei, Italy, a high volcanic and seismic risk area with around 500,000 inhabitants and characterised by a complex spatial pattern of site amplification…
What exactly does a Data Engineering Manager at a FAANG company or in a $250k+ role do day-to-day [Reddit]
With over 15 years of experience leading large-scale data modernization and cloud migration initiatives, I’ve noticed that despite handling major merger integrations and on-prem to cloud transformations, I’m not getting calls for Data Engineering Manager roles at FAANG or $250K+ positions. What concrete steps should I take over the next year to strategically position myself and break into these top-tier opportunities. Any tools which can do ATS,AutoApply,rewrite,any reference cover letter or resume…
Bayesian data analysis for newcomers
This article explains the foundational concepts of Bayesian data analysis using virtually no mathematical notation…The article clarifies misconceptions about Bayesian methods that newcomers might have acquired elsewhere. We discuss prior distributions and explain how they are not a liability but an important asset. We discuss the relation of Bayesian data analysis to Bayesian models of mind, and we briefly discuss what methodological problems Bayesian data analysis is not meant to solve. After you have read this article, you should have a clear sense of how Bayesian data analysis works and the sort of information it delivers, and why that information is so intuitive and useful for drawing conclusions from data…

Last Week's Newsletter's 3 Most Clicked Links

.
* Based on unique clicks.
** Find last week's issue #622 here.

Cutting Room Floor

Whenever you're ready, 2 ways we can help:

Looking to get a job? Check out our “Get A Data Science Job” Course
It is a comprehensive course that teaches you everything you need to know about getting a data science job, based on answers to thousands of reader emails like yours. The course has three sections: Section 1 covers how to get started, Section 2 covers how to assemble a portfolio to showcase your experience (even if you don’t have any), and Section 3 covers how to write your resume.
Promote yourself/organization to ~68,500 subscribers by sponsoring this newsletter. 30-35% weekly open rate.

Thank you for joining us this week! :)

Stay Data Science-y!

All our best,
Hannah & Sebastian

Data Science Weekly Newsletter

Discussion about this post

Ready for more?