Data Science Weekly - Issue 597
Curated news, articles and jobs related to Data Science, AI, & Machine Learning
Issue #597
May 01, 2025
Hello!
Once a week, we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.
And now…let's dive into some interesting links from this week.
Editor's Picks
I’ve been doing ML for 19 years. AMA [Reddit]
Built ML systems across fintech, social media, ad prediction, e-commerce, chat & other domains. I have probably designed some of the ML models/systems you use. I have been engineer and manager of ML teams. I also have experience as startup founder…
What Are the Greatest Karaoke Songs of All Time? A Statistical Analysis
Once a modest diversion in sloppy dive bars, the karaoke industry is estimated at $5.5B worldwide and continues to grow as people seek communal experiences to supplement their increasingly digital lives. When one walks into a karaoke bar, they are presented with a thick binder featuring thousands of empty orchestral tunes, granting patrons near-limitless song selection. And yet, despite this sprawling songbook, almost every karaoke night gravitates toward a familiar roster of timeless crowd-pleasers…So today, we'll explore the most frequently selected karaoke songs, the commonalities among these staples, and the musical eras celebrated by this pastime…What If We Could Rebuild Kafka From Scratch?
If we were to start all over and develop a durable cloud-native event log from scratch—Kafka.next if you will—which traits and characteristics would be desirable for this to have? Separating storage and compute and object store support would be table stakes, but what else should be there? Having used Kafka for many years for building event-driven applications as well as for running realtime ETL and change data capture pipelines, here’s my personal wishlist…
Sponsor Message
Analyse and visualise data with your AI assistant
Unlock the full potential of your data with Conjointly's Insights Explorer. The Insights Explorer is a free browser-based rswam IDE that includes an AI assistant, allowing you to generate analysis without writing additional code or installing software.
Simply ask the AI assistant questions about your data in plain English, and it will generate executable code ready to help you transform, visualise, and explore your data. Insights Explorer helps you simplify your workflow and streamline the path from data to decision.
Whether you need quick analysis or deep data exploration, the Insights Explorer helps you focus on the insights that really matter, rather than getting bogged down in technicalities and common pain points of data programming. Spend less time searching for the right libraries, looking up syntax, restructuring data or debugging code and more time interpreting results and extracting meaningful insights for business decisions.
* Want to sponsor the newsletter? Email us for details --> team@datascienceweekly.org
What’s on your mind
This Week’s Poll:
Last Week’s Poll:
.
Data Science Articles & Videos
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
This survey provides a comprehensive overview, framing intelligent agents within a modular, brain-inspired architecture that integrates principles from cognitive science, neuroscience, and computational research. We structure our exploration into four interconnected parts…Eigenvectors and Eigenvalues (Explained Visually)
Eigenvalues/vectors are instrumental to understanding electrical circuits, mechanical systems, ecology and even Google's PageRank algorithm. Let's see if visualization can make these ideas more intuitive…Backpropagation Through Time (BPTT): Explained With Derivations
In this article, we will delve into the intricate details of the BPTT algorithm and how it is used for training RNNs…For RNNs to learn sequential data, a variant of the backpropagation algorithm known as "Backpropagation Through Time" (BPTT) is used. In this article, we will delve into the intricate details of the BPTT algorithm and how it is used for training RNNs. We will cover the intuition and derivation of BPTT for training RNNs using Gradient Descent…Presto: A Smarter Way to Use Satellite Data to Help Farmers
At a recent webinar hosted by the European Space Agency’s WorldCereal project, NASA Harvest’s Gabi Tseng presented alongside project partners with WorldCereal to showcase how the geospatial model Presto is transforming crop mapping. In addition to introducing the science behind Presto, the webinar also showed how users can apply Presto to their own public and private data to create customized crop type maps—empowering users around the world to better understand agricultural landscapes…How is your teaming using AI for DS? [Reddit]
I see a lot of job posting saying “leverage AI to add value”. What does this actually mean? Using AI to complete DS work or is AI is an extension of DS work?…I’ve seen a lot of cool is cases outside of DS like content generation or agents but not as much in DS itself. Mostly just code assist of document creation/summary which is a tool to help DS but not DS itself…Tips on How to Connect at Academic Conferences
How to navigate social situations and make friends is not always intuitive, and has to be learnt. This is particularly true at a conference, where the event is short (just a few days) and the number of people may be intimidatingly large (in the thousands of people). So I wrote this post to give some pointers on how I might go about socially navigating such events, particularly as a newcomer…I really like Scuba (Meta's internal real-time database system)
I wrote a tiny blog post about Scuba's UI (oriented towards exploratory, real time analysis of time series databases), and how difficult it has been to find an equivalent concept in OSS….Robotics Worldwide Workshop
"All Robotics in 90 minutes" lightning talks and a panel discussion on the future of robotics. not for the faint of heart…DataMap: A Portable Application for Visualizing High-Dimensional Data
The visualization and analysis of high-dimensional data are essential in biomedical research. There is a need for secure, scalable, and reproducible tools to facilitate data exploration and interpretation. Results: We introduce DataMap, a browser-based application for visualization of high-dimensional data using heatmaps, principal component analysis (PCA), and t-distributed stochastic neighbor embedding (t-SNE). DataMap runs in the web browser, ensuring data privacy while eliminating the need for installation or a server. The application has an intuitive user interface for data transformation, annotation, and generation of reproducible R code…
Recent advances have shown that reinforcement learning (RL) can induce advanced reasoning capabilities in language models. Training language models with the right objective allows them to learn that using more inference compute is beneficial for performance. This behavior, known as inference-time scaling, emerges with sufficient RL compute and a well-designed environment. In this post, we explore how scaling RL compute further could unlock even greater inference-time capabilities. We address topics like reasoning priors, sequential inference, parallel inference, and sources of reward to train more general reasoners…
So You Want to Work in Mechanistic Interpretability?
If you’re here, you know that Mechanistic Interpretability is a rapidly evolving field at the intersection of machine learning, neuroscience, and systems engineering. At Anthropic, we are looking for exceptional individuals who can help us understand how language models work at a fundamental level and use that understanding to make AI systems safer and more reliable. As this is a new field, it requires a number of diverse skills that many common career paths do not expose a person to!…We wrote this update to help motivated people from different backgrounds interested in Interpretability develop the other skills necessary to contribute….Introducing multideploy: Streamline file deployments across multiple repositories
If you’re managing multiple GitHub repositories, you’ve probably encountered the frustration of maintaining consistent files across them all. Whether it’s standardizing CI workflows, license files, code style configurations, or security policies - keeping everything in sync can be tedious and error-prone. How do you update a GitHub Actions workflow across 50 repositories? What about implementing a new organizational guide across dozens of repos?…Introducing multideploy…An Illustrated Guide to Automatic Sparse Differentiation
In numerous applications of machine learning, Hessians and Jacobians exhibit sparsity, a property that can be leveraged to vastly accelerate their computation. While the usage of automatic differentiation in machine learning is ubiquitous, automatic sparse differentiation (ASD) remains largely unknown. This post introduces ASD, explaining its key components and their roles in the computation of both sparse Jacobians and Hessians. We conclude with a practical demonstration showcasing the performance benefits of ASD…
.
Last Week's Newsletter's 3 Most Clicked Links
.
* Based on unique clicks.
** Find last week's issue #596 here.
Cutting Room Floor
Anaconda Licensing Changes - How to use Conda without running afoul of Anaconda licensing
AGI is not a milestone - There is no capability threshold that will lead to sudden impacts
State of play of AI progress (and related brakes on an intelligence explosion)
.
Whenever you're ready, 2 ways we can help:
Looking to get a job? Check out our “Get A Data Science Job” Course
It is a comprehensive course that teaches you everything related to getting a data science job based on answers to thousands of emails from readers like you. The course has 3 sections: Section 1 covers how to get started, Section 2 covers how to assemble a portfolio to showcase your experience (even if you don’t have any), and Section 3 covers how to write your resume.Promote yourself/organization to ~68,000 subscribers by sponsoring this newsletter. 30-40% weekly open rate.
Thank you for joining us this week! :)
Stay Data Science-y!
All our best,
Hannah & Sebastian