Data Science Weekly - Issue 583
Curated news, articles and jobs related to Data Science, AI, & Machine Learning
Issue #583
January 23, 2025
Hello!
Once a week, we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.
And now…let's dive into some interesting links from this week.
Editor's Picks
Cellm: Use LLMs in Excel formulas
Cellm is an Excel extension that lets you use Large Language Models (LLMs) like ChatGPT in cell formulas…
The Rise of Single-Node Processing: Challenging the Distributed-First Mindset
2024 witnessed growing interest in single-node processing frameworks, with tools like DuckDB, Apache DataFusion, and Polars receiving increased attention and gaining unprecedented popularity from the data community.This trend represents more than just a technological advancement—it marks a fundamental reassessment of how we approach data analytics…In this article, I will dive deeper into the subject, exploring it in greater detail and providing further insights…
Observability: the present and future, with Charity Majors
Our conversation explores the ever-changing world of observability, covering these topics:
• What is observability? Charity’s take
• What is “Observability 2.0?”
• Why Charity is a fan of platform teams
• Why DevOps is an overloaded term: and probably no longer relevant
• What is cardinality? And why does it impact the cost of observability so much?
• How OpenTelemetry solves for vendor lock-in
• And more!
What’s on your mind
This Week’s Poll:
Jupyter Notebooks?
[Take this quick 5-second poll →]
We’ll share the results next week!
Last Week’s Poll:
I’m very curious what the something else is. If you clicked on that, hit reply and let me know and I’ll share with the group :)
Data Science Articles & Videos
Batch Inference vs Online Inference
how do you deploy your model so that it can be used by others and generate real value? A web search may point you to tutorials discussing how to stand up a Flask front-end that serves your model. But does that architecture actually fit your use-case?…The first question you need to answer is whether you should use batch inference or online inference to serve your models. What are the differences between these approaches? When should you favor one over the other? And how does this choice influence the technical details of the model deployment? In the following sections we’ll answer each of these questions and provide real world examples of both batch and online inference…So you wanna write Kubernetes controllers?
What they don't tell you about developing scalable and reliable controllers…Are there any ways to earn a little extra money on the side as a data scientist? [Reddit Discussion]
Using data science skills (otherwise I'm sure there are plenty)…I know there is data annotation, but I'm not sure that qualifies as data science…Working with colours in R
Whether you're building data visualisations or generative art, at some point you will likely need to consider which colours to use in R. This blog post describes different ways to define colours, how to make good choices about colour palettes, and ways to generate your own colour schemes…Making sense of commonality analysis
Using commonality analysis to identify unique and shared (common) variance across three or more variables…TwoTimeScales: Analysis of Event Data with Two Time Scales
Analyse time to event data with two time scales by estimating a smooth hazard that varies over two time scales and also, if covariates are available, to estimate a proportional hazards model with such a two-dimensional baseline hazard. Functions are provided to prepare the raw data for estimation, to estimate and to plot the two-dimensional smooth hazard. Extension to a competing risks model are implemented. For details about the method please refer to Carollo et al…Data Have a Limited Shelf Life
Data, unlike some wines, do not improve with age. The contrary view, that data are immortal, a view that may underlie the often-observed tendency to recycle old examples in texts and presentations, is illustrated with three classical examples and rebutted by further examination. Some general lessons for data science are noted, as well as some history of statistical worries about the effect of data selection on induction and related themes in recent histories of science…Playing with the classification report
In this video we will play around with a confusion matrix widget that will help us understand how the numbers in the classification report in scikit-learn are created. The classification report is a great utility, but it can help to remind oneself of what the numbers really mean…Modern Polars – A side-by-side comparison of the Polars and Pandas libraries
This is a side-by-side comparison of the Polars and Pandas dataframe libraries, based on Modern Pandas…The bulk of this book is structured examples of idiomatic Polars and Pandas code, with commentary on the API and performance of both…For the most part, I argue that Polars is “better” than Pandas, though I do try and make it clear when Polars is lacking a Pandas feature or is otherwise disappointing…
Causal Inference Meets Deep Learning: A Comprehensive Survey
The article describes the integration of causal inference with traditional deep learning algorithms and illustrates its application to large model tasks as well as specific modalities in deep learning. The current limitations of causal inference and future research directions are discussed. Moreover, the commonly used benchmark datasets and the corresponding download links are summarized…How to switch from pyenv to uv for managing Python versions
This guide shows how to transition from using pyenv to uv for managing Python versions. While pyenv has been a reliable tool for many years, uv offers faster performance and more integrated workflows…MLOps engineers: What exactly do you do on a daily basis in your MLOps job? [Reddit Discussion]
I am trying to learn more about MLOps as I explore this field. It seems very DevOpsy, but also maybe a bit like data engineering? Can a current working MLOps person explain to what they do on a day to day basis? Like, what kind of tasks, what kind of tools do you use, etc? Thanks!…The Mathematics of Artificial Intelligence
This article focuses on the application of analytical and probabilistic tools to model neural network architectures and better understand their optimization. Statistical questions (particularly the generalization capacity of these networks) are intentionally set aside, though they are of crucial importance…We also shed light on the evolution of ideas that have enabled significant advances in AI through architectures tailored to specific tasks, each echoing distinct mathematical techniques. The goal is to encourage more mathematicians to take an interest in and contribute to this exciting field…
.
Last Week's Newsletter's 3 Most Clicked Links
.
* Based on unique clicks.
** Find last week's issue #582 here.
Cutting Room Floor
.
Whenever you're ready, 2 ways we can help:
Looking to get a job? Check out our “Get A Data Science Job” Course
It is a comprehensive course that teaches you everything related to getting a data science job based on answers to thousands of emails from readers like you. The course has 3 sections: Section 1 covers how to get started, Section 2 covers how to assemble a portfolio to showcase your experience (even if you don’t have any), and Section 3 covers how to write your resume.Promote yourself/organization to ~65,442 subscribers by sponsoring this newsletter. 35-45% weekly open rate.
Thank you for joining us this week! :)
Stay Data Science-y!
All our best,
Hannah & Sebastian