Data Science Weekly - Issue 598
Curated news, articles and jobs related to Data Science, AI, & Machine Learning
Issue #598
May 08, 2025
Hello!
Once a week, we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.
And now…let's dive into some interesting links from this week.
Editor's Picks
Task estimation: Conquering Hofstadter's Law
Estimating the completion time of tasks is important work. It’s also something that brings great anguish to a lot of engineers and software developers, and causes significant friction between dev teams, management, other departments, and customers. This is because almost everybody is still talking about estimation the wrong way…
Understanding Modern AI is Understanding Embeddings: A Guide for Non-Programmers (with lots of dogs!)
Embeddings are a core AI concept that underpin a great deal of what we today think of as being AI. This article is going to give you an accurate and intuitive understanding of what an “embedding” is in less time than it takes to eat a (very large) bagel, and possibly make you think they’re as cool as I think they are. It even explains how we got to embeddings as a solution, by looking at everything else we tried along the way. If you’re comfortable with even very simple Excel formulas, you’ll understand all the maths, and there’s even a cute graph with dogs on it…How a straight line teaches machines to learn
Building an intuitive understanding of how Linear Regression works and how it leads to Gradient Descent…Learning, to a computer, is just turning bad guesses into better ones. In this post, we’ll see how that starts with a straight line: how linear regression makes the first guess, and gradient descent keeps improving it…
What’s on your mind
This Week’s Poll:
Last Week’s Poll:
.
Data Science Articles & Videos
Vector Database vs Graph Database: Key Differences
While traditional relational databases, which primarily handle structured data, have long been the standard, specialized database technologies are emerging to handle vast amounts of both structured and unstructured data and complex queries. Among these new types of databases are graph and vector databases, both highly specialized and designed for specific tasks. Although there is some potential overlap in use cases, such as the fact that both are frequently used in AI applications for RAG (retrieval augmented generation) capabilities, they are fundamentally different in their approach…Preparing for a DeepMind Gemini Team Interview — Any Resources, Tips, or Experience to Share? [Reddit]
I'm currently preparing for interviews with the Gemini team at Google DeepMind, specifically for a role that involves system design for LLMs and working with state-of-the-art machine learning models…I'm reaching out because I'd love to hear from anyone who:Has gone through a DeepMind, Gemini, or similar AI/ML research team interview
Has tips for LLM-related system design interviews
Can recommend specific papers, blog posts, podcasts, videos, or practice problems that helped you
Has advice on team culture, communication, or mindset during the interview process
I'm particularly interested in how they evaluate "system design for ML" compared to traditional SWE system design, and what to expect culture-wise from Gemini's team dynamics…
AI Ethics Course
AI is already shaping high-stakes decisions — who gets hired, who qualifies for a loan, who gets access to services. But when it’s not built responsibly, it can amplify inequality, limit opportunity, and erode trust. This course is part of the DIVERSIFAIR project, an EU-backed initiative created to help professionals build ethical AI that’s fair, transparent, and accountable — not just technically accurate…Python Polars: The Definitive Guide, with Jeroen Janssens and Thijs Nieuwdorp
Jeroen Janssens and Thijs Nieuwdorp are data frame library Polars’ greatest advocates in this episode with Jon Krohn, where they discuss their book, Python Polars: The Definitive Guide, best practice for using Polars, why Pandas users are switching to Polars for data frame operations in Python, and how the library reduces memory usage and compute time up to 10x more than Pandas…Zero to One: Learning Agentic Patterns
AI agents. Agentic AI. Agentic architectures. Agentic workflows. Agentic patterns. Agents are everywhere…This post aims to explore common design patterns. Think of these patterns as blueprints or reusable templates for building AI applications. Understanding them provides a mental model for tackling complex problems and designing systems that are scalable, modular, and adaptable. We'll dive into several common patterns, differentiating between more structured workflows and more dynamic agentic patterns. Workflows typically follow predefined paths, while agents have more autonomy in deciding their course of action…Learning Algorithm Of Biological Networks
My name is Artem, I'm a graduate student at NYU Center for Neural Science and researcher at Flatiron Institute. In this video we explore Predictive Coding – a biologically plausible alternative to the backpropagation algorithm, deriving it from first principles…Linear Programming for Fun and Profit
At Modal, we’ve built a “resource solver” system which is capable of finding and enjoying arbitrages in this cloud storm, satiating our customers’ demand for scalable compute at good prices. Did you know that a few months ago you could get hundreds of superior H200 GPUs for 20% less than the going rate for inferior H100s? The solver did, and it took that deal. At its core, Modal’s resource solver is a linear programming, or LP, solver — an algorithm which can quickly and reliably maximize an objective given a set of linear constraints…Google’s Hybrid Approach to Research
We describe how we organize Computer Science (CS) research at Google. We focus on how we integrate research and development (R&D) and discuss the benefits and risks of our approach…Navigating a career in statistics: reflections from senior leaders
I recently spoke to colleagues in senior roles who had been government statisticians early in their careers, to find out more about their journeys and what advice they had for the latest recruits…the statisticians included in this blog have gone on to senior roles in a range of areas, including policy, programme management, analytical leadership and data focused positions. They universally felt their background as a statistician was helping them to succeed…
Sampling Raster Data with XArray
This tutorial covers the technique for efficiently interpolating and sampling raster data using XArray and rioxarray…
Untangling spaghetti code with smartrappy
smartrappy is designed to help you understand and visualise the dependencies in analytical Python projects that haven’t hit the heady heights of having auto-generated their own directed cyclic graph. Let’s be honest, that’s most projects. It’s important to say that smartrappy is just sniffing out relationships between code and data and other things in a project, and it does a fairly good job of that, but it’s not perfect and it certainly won’t find everything. For a lot of analytical Python code, though, it should give you a big boost in understanding what’s going on…Introducing chores
What’s a 1-minute data science task you do all of the time? Could you teach another data scientist to do it with a couple paragraphs of explanation and an example or two? If so, you might benefit from checking out chores, a new package that helps you with tedious but hard-to-automate tasks…chores followed up on the initial release of ellmer, a package that makes it easy to use large language models (LLMs) from R. The package connects ellmer to your source editor in RStudio and Positron via a collection of chore helpers…Deep Dive into Yann LeCun’s JEPA
In his position paper A Path Towards Autonomous Machine Intelligence and his many recent talks (linked below), Yann presents an alternative framework for achieving artificial intelligence. He also proposes a new architecture for a predictive world model: Joint Embedding Predictive Architecture (JEPA). This blog post will dive deep into Yann’s vision for AI, the JEPA architecture, current research, and energy-based models. We will go deep into the technical aspects of these ideas, as well as give my opinions, along with interesting references. I will also cover recent research advances such as V-JEPA…
.
Last Week's Newsletter's 3 Most Clicked Links
.
* Based on unique clicks.
** Find last week's issue #597 here.
Cutting Room Floor
.
Whenever you're ready, 3 ways we can help:
Want to get better at Data Science / Machine Learning Math? I have a two weekly tutoring slots open. Hit reply to this email and let me know what you want to learn.
Looking to get a job? Check out our “Get A Data Science Job” Course
It is a comprehensive course that teaches you everything related to getting a data science job based on answers to thousands of emails from readers like you. The course has 3 sections: Section 1 covers how to get started, Section 2 covers how to assemble a portfolio to showcase your experience (even if you don’t have any), and Section 3 covers how to write your resume.Promote yourself/organization to ~68,000 subscribers by sponsoring this newsletter. 30-40% weekly open rate.
Thank you for joining us this week! :)
Stay Data Science-y!
All our best,
Hannah & Sebastian