Data Science Weekly - Issue 640
Curated news, articles and jobs related to Data Science, AI, & Machine Learning
Issue #640
February 26, 2026
Hello!
Once a week, we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.
Sponsor Message
Think of Otio as a Google Drive with an AI built in.
The bottleneck isn’t access to information - it’s synthesis. Thirty tabs, a dozen reports, scattered spreadsheets, a handful of half-read articles, and all of it needs to become a deliverable by Friday.
Otio.ai is a research workspace where everything lives in one place - upload directly or connect your Google Drive. Unlimited storage, no file juggling.
Drop in a CSV and start making sense of your data. Otio analyses it, generates visualisations, and lets you chat across it, alongside every other source - using any AI model (Claude, GPT, Gemini, DeepSeek, Grok). Switch models without re-uploading and find every answer source-grounded with verifiable citations.
Finally, turn answers into exportable deliverables - reports, presentations, documents - directly from your material. 200,000+ researchers, analysts, and consultants use Otio to go from raw data to finished work.
Try Otio free - chat with your data, visualise it, deliver it.
.
And now…let’s dive into some interesting links from this week.
Editor's Picks
Honey, I Tiled the Tensors
Shapes, Strides, Swizzles and Suffering! - An intro to Layout Algebra…Layouts are a powerful abstraction introduced in NVIDIA’s CuTe library for making operations on complicated Tensor configurations a little bit easier to understand. My goal here is to provide a good taste of how operations on these layouts work and an example matrix-matrix multiplication kernel to show the value of these abstractions and its drawbacks...
Britain Lost 14,000 Third Places. They Were Called Pubs. Is Your Local Next?
How private equity reshaped the local, the postcode tool that shows the pubs most at risk and most importantly what to do about it…what I found isn’t really a story about beer. It’s a story about what happens to a country when the places where people belong become assets on someone else’s balance sheet…Deep Learning for Tabular Data: The Foundation Model Era
It’s been nearly four years since I first summarized the state of deep learning (DL) for tabular data, and about three years since my follow-up post. Back then, the verdict was pretty clear: for most tabular data scenarios, especially those with heterogeneous features and even very large sample sizes, gradient boosting methods like XGBoost, LightGBM, and CatBoost, were still the pragmatic and performant choice. Complex DL architectures like TabNet and even transformer-based models struggled to consistently outperform these boosting approaches, which meant they weren’t really viable options for most practitioners or production. The question now is: has anything fundamentally changed?…
What’s on your mind
This Week’s Poll:
.
Last Week’s Poll:
.
Data Science Articles & Videos
Querying 3 billion vectors
Recently, I got nerd-sniped by this exchange between Jeff Dean and someone trying to query 3 billion vectors. I was curious to see if I could implement the optimal map-reduce solution he alludes to in his reply…I started by writing an extremely naive implementation which made the following assumptions…Corporate Politics for Data Professionals [Reddit]
I recently learned the hard way that, even for technical roles, like DS, at very technical companies, corporate politics and managing relationships, positioning, and expectations play as much of a role as technical knowledge and raw IQ. What have been your biggest lessons for navigating corporate environments and what advice would you give to young DS who are inexperienced in these environments?…
Superintelligence and Law
Operating autonomously or under only limited human oversight, AI agents will assume a growing range of roles in the legal system. First, in making consequential decisions and taking real-world actions, AI agents will become de facto subjects of law. Second, to cooperate and compete with other actors (human or non-human), AI agents will harness conventional legal instruments and institutions such as contracts and courts, becoming consumers of law. Third, to the extent AI agents perform the functions of writing, interpreting, and administering law, they will become producers and enforcers of law…Causal inference for psychologists who think that causal inference is not for them
Psychologists’ causal inference training often focuses on the conclusion that experiments are needed, without much consideration for the causal inference frameworks used elsewhere. This leaves researchers ill-equipped to solve inferential problems that they encounter in their work, leading to mistaken conclusions and incoherent statistical analyses. For a more systematic approach to causal inference, this article provides brief introductions to the potential outcomes framework—the “lingua franca” of causal inference—and to directed acyclic graphs, a graphical notation that makes it easier to systematically reason about complex causal situations…Agentic workflows for software development (The promise and the reality: Notes from the field)
As we’ve observed from McKinsey engagements, while the “developer with AI assistant” model makes individual practitioners faster, in an enterprise context, the efficiency improvement from idea to live feature is typically less significant. The handoff from requirements to design to implementation is where context goes to die. Decisions buried in Slack threads. Assumptions in someone’s head. Rationale re-litigated because no one can find the original reasoning. AI assistants can accelerate the work within a phase of the SDLC as long as you don’t expect them to fix the boundaries between them…Digital Science awards 2025 Catalyst Grants to two teams visualizing the future of research
The winning teams, both based in the United States, will use the funding to develop their ideas, including visualizations that demonstrate the influence and impact of their research. Their innovations are directly relevant to researchers, academic institutions, scholarly publishers, and funders.The winning teams from Digital Science’s 2025 Catalyst Grant round are:
FigureTwo - transforms static research figures into interactive, data-connected visuals
Pathfinder - maps how ideas spread across more than 100,000 research communities, showing how discoveries in one area influence progress in another
I Spent the Last Month and a Half Building a Model that Visualizes Strategic Golf (A way to actually see the ideas hidden in golf course architecture)
For the last month and a half, I’ve been working nonstop on a project to illustrate what I believe is a new way to look at golf design. And after more consecutive late nights of coding than I’d like to admit to my partner, I’ve finally cleared the first big hurdle. Here, I’m going to walk through the tool I’ve been furiously building from scratch to bring my way of looking at golf architecture – an expansion on Mark Broadie’s foundational strokes gained approach – to life. It’s going to get a bit technical, so please bear with me…Data To Art
Data To Art is a curated online gallery showcasing the work of international data experts. It explores the beauty of data and the power of visual storytelling…Field Sobriety Tests and the Base Rate Fallacy
In Chapter 9 of Probably Overthinking It, I wrote about Drug Recognition Experts (DREs), who are law enforcement officers trained to recognize impaired drivers. I reviewed the research papers that were supposed to evaluate the accuracy of DREs, and I summarized my impressions like this: “What I found was a collection of studies that are, across the board, deeply flawed. Every one of them features at least one methodological error so blatant it would be embarrassing at a middle school science fair…”…
How I Use Claude Code
I’ve been using Claude Code as my primary development tool for approx 9 months, and the workflow I’ve settled into is radically different from what most people do with AI coding tools…The workflow I’m going to describe has one core principle: never let Claude write code until you’ve reviewed and approved a written plan. This separation of planning and execution is the single most important thing I do. It prevents wasted effort, keeps me in control of architecture decisions, and produces significantly better results with minimal token usage than jumping straight to code…When I started writing this blog in late 2020, one of my first ideas for a post was called “How not to be fooled by viral charts”. I had a list of famous graphs all ready to go. But for some reason, I postponed that post, and over the years, the list of charts kept growing, and I kept putting it off. Well, no longer. I’ve finally been so annoyed by a viral chart that I can no longer put off this post. But the list has grown so long that I’m going to have to split the post into two parts. So today we’re going to learn how to identify charts that contain misinformation — intentional deception, careless mistakes, or just generally meaningless data. In part 2, we’ll learn how to interpret charts that use good data, but which tell a story that’s more nuanced, most people realize…
What changed between my failed interviews and the one that got me an offer [Reddit]
I went through a pretty rough interview cycle last year, applying to data analyst/data scientist roles (mostly around NYC). made it to final rounds a few times, but still got rejected. i finally landed an offer a few months ago, and thought i’d just share what changed and might guide others going through the same thing right now……..so essentially for me the breakthrough wasn’t just to learn another tool or grind more questions. though i’m no longer interviewing for data roles, i’d love to hear other successful candidate experiences. might help those looking for tips or even just encouragement on this sub! :)…Untapped Way to Learn a Codebase: Build a Visualizer
The biggest shock of my early career was just how much code I needed to read that others wrote…In this post, I’m going to walk you through how I learn an unfamiliar codebase. But I’ll admit, this isn’t precisely how I would do it today. After years of working on codebases, I’ve learned quite a lot of shortcuts. Things that come with experience that just don’t translate for other people. So what I’m going to present is a reconstruction. I want to show bits and parts of how I go from knowing very little to gaining knowledge and ultimately, asking the right questions…To do this, I will use just a few techniques:Setting a goal
Editing randomly
Fixing things I find that are broken
Reading to answer questions
Making a visualizer
.
Last Week's Newsletter's 3 Most Clicked Links
.
* Based on unique clicks.
** Please take a look at last week's issue #639 here.
Cutting Room Floor
.
Whenever you're ready, 2 ways we can help:
Looking to get a job? Check out our “Get A Data Science Job” Course
It is a comprehensive course that teaches you everything you need to know about getting a data science job, based on answers to thousands of reader emails like yours. The course has three sections: Section 1 covers how to get started, Section 2 covers how to assemble a portfolio to showcase your experience (even if you don’t have any), and Section 3 covers how to write your resume.Promote yourself/organization to ~68,750 subscribers by sponsoring this newsletter. 30-35% weekly open rate.
Thank you for joining us this week! :)
Stay Data Science-y!
All our best,
Hannah & Sebastian



