Data Science Weekly - Issue 652
Curated news, articles and jobs related to Data Science, AI, & Machine Learning
Issue #652
May 21, 2026
Hello!
Once a week, we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.
And now…let’s dive into some interesting links from this week.
Editor's Picks
What’s going on in computational neuroscience nowadays? (part 1)
A month ago I came back from Cosyne, the annual Computational and Systems Neuroscience conference…The days are a haze of tutorials, talks, poster sessions, and workshops, usually appended by dinners and drinks past midnight…I seem to have a hard time writing one-off pieces, so I’m leaning into that and writing this as a series. I’ll only be able to write a very narrow perspective of Cosyne, of course, but most of the talks are on YouTube on the official channel if you want to see for yourself. (That also means this will be more personal thoughts than report)…
Is logistic regression regression?
I came across a post recently by a machine learning engineer who made the bold claim that logistic regression is the worst name for an algorithm ever, or something along those lines…Many statisticians of the more old-school type seemed to disagree. This led me to think a bit more deeply about the subject. I’ve already written several posts on bad terminology in statistics (see confidence level, line of best fit, r squared) so I might have been expected to agree with the machine learning view, but in this case I agree with the statisticians, and I would like to explain why…What Every Experimenter Must Know About Randomization
Randomized controlled experiments offer gold-standard insight into cause and effect. The knowledge that informs our most important decisions. Unfortunately, randomization in such experiments is often botched. Randomization errors silently invalidate the interpretation of experimental results, turning a fruitful quest for knowledge into a waste of time and money, or, worse, a wellspring of misinformation. Fortunately, these fatal errors are easy to spot and fix. So whether you’re a webmaster using A/B testing to increase engagement, a medical researcher evaluating vaccines, a factory manager exploring productivity improvements, or a scientist seeking the laws that govern nature or human affairs, read on…
What’s on your mind
This Week’s Poll:
.
Last Week’s Poll:
.
Data Science Articles & Videos
Converting testthat Tests to testit
Back in 2013, I wrote about testing R packages when I first released testit. Thirteen years later, I still believe that unit testing should be nothing more than “tell me if something unexpected happened.” Recently I converted a large testthat test suite to testit, and I thought I’d share a practical guide for anyone considering the same move…I’ve been in data science roles (both analytics and ML) for about 5 years now across a couple of companies. Lately I’ve been feeling a bit burned out because I keep seeing the same pattern…We spend weeks cleaning data, building dashboards, running statistical analysis, or training models… and then the stakeholders either:
Say “thanks” and never use it
Cherry-pick the numbers that support their existing opinion
Or just completely ignore the findings and go with gut feel anyway
The worst part is when leadership asks for a “data-driven decision” but they’ve already decided what they want to do…Am I alone in this? Or is this just the reality of data science in most companies?…
Tagging my blog posts with BERTopic and LLMs
I recently added tags to my blog using BERTopic and a mix of LLMs. You can see the tags in the sidebar to the right (or in the footer on mobile). I’ve done this before in 2023, with GGUF Mistral using llama-cpp, but never finished the project. Now, because the models have been getting so good, and my project was small, relatively well-defined, and easy to evaluate, the project took me about 6-10 hours over a month, using BERTopic, Claude Code, and Pi with Deepseek…What data science is actually about in the age of AI
I reflect on the evolving role of data scientists in the age of AI and LLMs. I argue that our core mission remains rigorous measurement, not full-stack development. While AI tools make building easier, the real value comes from defining and evaluating what truly matters. I share why measurement should be led by those closest to the problem and how data scientists can best contribute. Are we losing sight of what makes data science essential in the rush to build with AI?…I’ve wanted to dive deeper into the fundamentals of AI for a while now - it feels a little bit magical, and a little bit wrong, to operate alongside AI without a strong understanding of how the underlying mechanisms work. Naturally, I had to write a transformer, and Neel Nanda’s ”GPT-2 From Scratch” was my resource of choice…This post is meant to document my process of learning and to address some of the questions I was curious about when implementing the transformer for the first time. It includes an overview of transformer basics and some of my intuitions, followed by some of the points of interest (transformer secrets, if you will) and challenges I ran into…
The last six months in LLMs in five minutes
I presented this lightning talk at PyCon US 2026, attempting to summarize the last six months of developments in LLMs in five minutes….Five tips for managing your R-universe
rOpenSci’s R-universe system is an open source platform allowing users to create their own CRAN-like universe of R packages…This post gives five tips I have developed to help manage my R-universe…Research-Driven Agents: What Happens When Your Agent Reads Before It Codes
Coding agents working from code alone generate shallow hypotheses. Adding a research phase — arxiv papers, competing forks, other backends — produced 5 kernel fusions that made llama.cpp CPU inference 15% faster.Kalman and Extended Kalman Filters: Concept, Derivation and Properties
This report presents and derives the Kalman filter and the Extended Kalman filter dynamics. The general filtering problem is formulated and it is shown that, under linearity and Gaussian conditions on the systems dynamics, the general filter particularizes to the Kalman filter. It is shown that the Kalman filter is a linear, discrete time, finite dimensional time-varying system that evaluates the state estimate that minimizes the mean-square error…
How not to turn power on its head
In giving some informal remarks about power at a seminar a couple of weeks ago, I proposed that the tendency to turn the notion of power on its head might be avoided by imagining we need to define a test’s error probabilities in terms of its power alone. We can refer to the power against the null hypothesis, rather than alluding to a type 1 error probability, for example…What do I mean by turning power on its head? I mean, at least here, supposing that a test provides poor evidence of discrepancies that the test has low power to detect…The Atlas-Learn Approach to the Manifold Hypothesis
The 2025 paper by Robinett et al., ‘Atlas-based Manifold Representations for Interpretable Riemannian Machine Learning’, provides an algorithm for fitting a low dimensional manifold from a point cloud by means of a novel algorithm for approximating an atlas of charts. This post illustrates the Atlas-Learn method by reconstructing a sphere from a 3D point cloud of naive random samples and works through some checks on accuracy…
Are there any small, quick things I can do everyday to keep my skills sharp? [Reddit]
I’m sure everyone knows about the dilemma of AI at this point. We want to work faster but our skills are atrophying yada yada…as a junior data scientist, I feel like I barely had any skills to begin with. Now with my company forcing us to use AI, I feel like I’m not learning much. Now I’ve been doing leetcode, but I just don’t think it’s that applicable to my real job. I don’t have the bandwidth outside of work to do a project yet, since my company is working us to the bone. What are some quick habits/tools/websites/apps you recommend to keep your skills sharp?..The Measurement of Loudness
I’m not a sound engineer of any sort, but I enjoy music and have been blessed with decent hearing acuity, so I tend to pay attention to noises going on around me. Now, I know what you’re thinking! Surely, a measurement nerd interested in sound would have bought a cheap SPL meter off the internet and this is what this post is about. And you’d be wrong! Hah! Because this post goes a bit further off the deep end because SPL and the dB(a) scales that we commonly associate with “sound volume” always confused me when I tried to understand them. Like with how measuring color is really difficult because it’s at the intersection of a physical measurement and human perception (see: How the heck does one measure color?), sound is just as messy because it’s again physical measurements (sound pressure) mediated by the human auditory system. To measure loudness, we’re going to have to go back a bit in history…
.
Last Week's Newsletter's 3 Most Clicked Links
.
* Based on unique clicks.
** Please take a look at last week's issue #651 here.
Cutting Room Floor
Comments on: What Every Experimenter Must Know About Randomization
Introducing Metrics SQL: A SQL-based semantic layer for humans and agents
Collaborating between Bioconductor and R-universe on Development of Common Infrastructure
.
Whenever you're ready, 3 ways we can help:
Go deeper each week (paid subscription)
Get 3 additional posts per week designed to help you:Statistics → understand the math behind ML
AI Agents → build with modern AI tools
Career → become more valuable at your job
Looking to get a job?
A practical guide to landing your first (or next) data science role, based on thousands of reader questions.
👉 Check out our “Get A Data Science Job” CoursePromote your organization/project/event to ~68,500 subscribers
Sponsor this newsletter and reach a highly engaged data science audience (30–35% open rate).
👉 Reply to this email to learn more
Thank you for joining us this week! :)
Stay Data Science-y!
All our best,
Hannah & Sebastian

