Data Science Weekly - Issue 623
Curated news, articles and jobs related to Data Science, AI, & Machine Learning
Issue #623
October 30, 2025
Hello!
Once a week, we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.
And now…let's dive into some interesting links from this week.
Editor's Picks
- Teaching Computers to Read: Rachel Wagner-Kaiser Interview 
 Teaching Computers to Read is a book focused on best practices and real-life challenges of building NLP and AI solutions for businesses…When I started this journey, my colleague and I were a bit struck by the dichotomy of AI NLP Books: either very high-level AI books targeted towards business leaders that really didn’t incorporate any technical details, or books that were very into the nitty-gritty technical details but lacked a lot of practical, business-driven advice. We wanted to create something that bridged the two: practical, technical advice about building useful AI solutions for business problems…
- 3Blue1Brown - But what is a Laplace Transform? 
 Visualizing the most important tool for differential equations…
- The Majority Of Your Users 
 The majority of your users don’t read your changelog. The majority of your users only upgrade to new versions when forced to. The majority of your users don’t know which version scheme your project uses. The majority of your users only read the documentation pages relevant to what they are trying to get done….
.
What’s on your mind
This Week’s Poll:
.
Last Week’s Poll:
.
Featured Message
“Teaching Computers to Read”: New book for practical AI and NLP solutions
Successful AI solutions aren’t about chasing the newest model - it’s about solving the right problems in the right way. “Teaching Computers to Read” (out November 5 from CRC Press) focuses on what technical teams need to design, develop, deploy, and maintain useful NLP and AI solutions. Drawing on real-world experience and examples, the book offers actionable best practices to deliver adaptable, reliable AI systems that address business challenges with lasting, tangible value. Check out the code companion for hands-on practice! Learn more or check out the book on Amazon.
  
.
* Want to be featured in the newsletter? Email us for details --> team@datascienceweekly.org
Data Science Articles & Videos
- Useful engineering management artifacts 
 When managing a growing organization, it can be useful to have certain document templates on hand. Here’s a collection of documents I find useful…
- Thoughts Regarding Levelling Up as a Data Scientists [Reddit] - As I look for new opportunities , I see there is one or two skills I dont have from the job requirements. I am pretty sure I am not the only one such a situation. How is everyone dealing with these kind of things ? Are you performing side projects to showcase you can pull that off or are you blindly honest about it, claiming that you can pick that up on the job?… 
- How We Did This Curiosity-Driven Research - An Open-Notebook Exploration of Emergent Grounding in LMs 
 A few days ago, we released a preprint1 showing evidence that symbol grounding can emerge in language models….The dust of release has not yet settled, and already a friend, after reading the manuscript, messaged me (Martin):- “I wouldn’t have thought to link these things. What made you do it?” - That question stayed with us, as it exposed a quiet truth about research: we too often present it as a polished surface or a linear arc from premise to conclusion, when in practice it unfolds as a cartography of detours. What looks, in retrospect, like a straight line was, in fact, a constellation of hunches, failed starts, fragile insights, and long hours spent staring at phenomena no one else thought worth staring at… 
- Factors associated with: problems of using exploratory multivariable regression to identify causal risk factors 
 Many medical and epidemiological studies use multivariable regression to test whether several independent variables (exposures) are causal determinants of a health outcome. Where mutually adjusted regression coefficients are significant, the exposures are labelled as risk factors for the outcome. We call this study design “factors associated with.” In this article, we argue that this method is flawed due to a lack of reasoning about which variables are treated as confounders, multiple statistical testing, and post hoc interpretation of the results. In some cases, researchers use algorithmic or stepwise approaches to select exposure variables, which further exacerbates these problems…
- Memory consumption, dataset size and performance: how does it all relate? 
 In this post we will talk about dataset size and how it influences the performance of your program. We will also give a few common-sense way to improve your program’s speed by decreasing the dataset size. Finally, we will give experiments to quantify the speed improvements. But before we begin, a few definitions…
- AI scientists history 
 AI Scientist as a term has been around for about 20 years. Recently, there are now papers and products described as AI Scientists. Here, I summarize some of those, the history of the term, and give some ideas on what might define an AI Scientist…
- Of course, someone has to write imperative code to build reproducible data science pipelines. It doesn’t have to be you. 
 The goal is to build a data science pipeline. The example here is purely illustrative, and compare a Nix-based approach to a non Nix-based approach. So, I built the same polyglot Real Business Cycle model pipeline twice. First, I did it without- {rixpress}(nor- {rix}), using a combination of Docker, Make, and a bunch of wrapper scripts. Then, I did it with- {rix}and- {rixpress}.- Both pipelines produce the exact same result. But the way to get there is fundamentally different… 
- The bug that taught me more about PyTorch than years of using it 
 A loss plateau that looked like my mistake turned out to be a PyTorch bug. tracking it down meant peeling back every layer of abstraction, from optimizer internals to GPU kernels…
- “Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are.” — Davis, Matt (2012) - You can probably read that sentence effortlessly. Despite the chaotic order of letters, your brain automatically reconstructs the intended words — because humans don’t read text letter by letter. We perceive shapes, patterns, and visual words… 
- The Principles of Diffusion Models 
 This monograph presents the core principles that have guided the development of diffusion models, tracing their origins and showing how diverse formulations arise from shared mathematical ideas. Diffusion modeling starts by defining a forward process that gradually corrupts data into noise, linking the data distribution to a simple prior through a continuum of intermediate distributions. The goal is to learn a reverse process that transforms noise back into data while recovering the same intermediates. We describe three complementary views…
- Citizens’ smartphones unravel earthquake shaking in urban areas - In this work, we use statistical spatial modelling to show that smartphone measurements collected by the Earthquake Network citizen science initiative allow us to map site amplification at high spatial resolution, generate high-resolution ShakeMaps, and improve existing ground motion models. We apply the method to the red zone of Campi Flegrei, Italy, a high volcanic and seismic risk area with around 500,000 inhabitants and characterised by a complex spatial pattern of site amplification… 
- What exactly does a Data Engineering Manager at a FAANG company or in a $250k+ role do day-to-day [Reddit] 
 With over 15 years of experience leading large-scale data modernization and cloud migration initiatives, I’ve noticed that despite handling major merger integrations and on-prem to cloud transformations, I’m not getting calls for Data Engineering Manager roles at FAANG or $250K+ positions. What concrete steps should I take over the next year to strategically position myself and break into these top-tier opportunities. Any tools which can do ATS,AutoApply,rewrite,any reference cover letter or resume…
- Bayesian data analysis for newcomers 
 This article explains the foundational concepts of Bayesian data analysis using virtually no mathematical notation…The article clarifies misconceptions about Bayesian methods that newcomers might have acquired elsewhere. We discuss prior distributions and explain how they are not a liability but an important asset. We discuss the relation of Bayesian data analysis to Bayesian models of mind, and we briefly discuss what methodological problems Bayesian data analysis is not meant to solve. After you have read this article, you should have a clear sense of how Bayesian data analysis works and the sort of information it delivers, and why that information is so intuitive and useful for drawing conclusions from data…
.
Last Week's Newsletter's 3 Most Clicked Links
.
* Based on unique clicks.
** Find last week's issue #622 here.
Cutting Room Floor
.
Whenever you're ready, 2 ways we can help:
- Looking to get a job? Check out our “Get A Data Science Job” Course 
 It is a comprehensive course that teaches you everything you need to know about getting a data science job, based on answers to thousands of reader emails like yours. The course has three sections: Section 1 covers how to get started, Section 2 covers how to assemble a portfolio to showcase your experience (even if you don’t have any), and Section 3 covers how to write your resume.
- Promote yourself/organization to ~68,500 subscribers by sponsoring this newsletter. 30-35% weekly open rate. 
Thank you for joining us this week! :)
Stay Data Science-y!
All our best,
Hannah & Sebastian



