Data Science Weekly - Issue 588
Curated news, articles and jobs related to Data Science, AI, & Machine Learning
Issue #588
February 27, 2025
Hello!
Once a week, we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.
And now…let's dive into some interesting links from this week.
Editor's Picks
An Overview of Large Language Models for Statisticians
This paper explores potential areas where statisticians can make important contributions to the development of LLMs, particularly those that aim to engender trustworthiness and transparency for human users. Thus, we focus on issues such as uncertainty quantification, interpretability, fairness, privacy, watermarking and model adaptation. We also consider possible roles for LLMs in statistical analysis…
Best Data Engineering 'Influencers' [Reddit]
I am wondering, what are your favourite data engineering 'influencers' (I know this term has a negative annotation)?…In other words what persons' blogs, YouTube channels, podcasts do you like yourself and would you recommend to others? For example I like: Seattle Data Guy, freeCodeCamp, Tech With Tim…The State of Machine Learning Competitions in 2024
We summarise the state of the ML competitions landscape and analyse the hundreds of competitions that took place in 2024. Plus an overview of winning solutions and commentary on techniques used…
What’s on your mind
This Week’s Poll:
What Paid Features Would You Most Value in Data Science Weekly?
I'm exploring how to enhance Data Science Weekly to better serve you. Could you help me by selecting the features you'd genuinely find valuable enough to consider a paid subscription ( $7/month )?
Take this quick 5-second poll →
We’ll share the results next week!
Last Week’s Poll:
Data Science Articles & Videos
Was Harvey Weinstein thanked more often than God at the Oscars?
I analysed almost 2,000 Oscar speeches to discover if the claim that Harvey Weinstein was thanked more often than God is true. Plus, we'll find out which Hollywood icon is bigger than both of them…LLM (ML) Job Interviews (Fall 2024) - Process
A retelling of my experience interviewing for ML/LLM research science/engineering focused roles in Fall 2024…This post has two parts: 1) Job Search Mechanics (including context, applying, and industry information), which you can continue reading below, and, 2) Preparation Material and Overview of Questions, which you can read at LLM (ML) Job Interviews - Resources…Fundamentals of GPU Architecture
This 9-part series on 'Fundamentals of GPU Architecture' covers everything from SIMT cores, warp to programming models…Geospatial Python Tutorials
Welcome to Spatial Analysis and Remote Sensing Tutorials by Spatial Thoughts. These tutorials complement our Python courses and are suitable for learners who want to advance their skills…Each tutorial is in the form of a self-contained notebook and comes with step-by-step explanation and datasets. Many tutorials also have an accompanying video walkthrough as well. The preferred way to run each notebook is using Google Colab. Click the icon _images/fa-rocket.svg located at the top of each tutorial to open it on Colab…TrueSkill Part 2: Who is the GOAT?
In the previous post, we argued that the question of ‘Who is the Greatest Of All Time?’ for any competitive game is answerable with an algorithm. They can account for the facts that skills vary over time and many players never played each other in their prime…In this post, we’ll use a variant of this algorithm, ‘TrueSkill Through Time,’ and match data to answer the ‘Who is the GOAT?’ question for Tennis, Boxing, and Warcraft 3…Designing a Table Format for ML Workloads
In recent years the concept of a table format has really taken off, with explosive growth in technologies like Iceberg, Delta, and Hudi. With so many great options, one question I hear a lot is variations of "why can't Lance use an existing format like ...?"…In this blog post I will describe the Lance table format and hopefully answer that question. The very short TL;DR: existing table formats don't handle our customer's workflows. Basic operations require too much data copy, are too slow, or cannot be parallelized…How Many Episodes Should You Watch Before Quitting a TV Show?
When to quit a subpar TV show, according to the data…What are the top three technical skills or platforms to learn, NOT named R, Python, SQL, or any of the BI platforms (eg Tableau, PowerBI)? [Reddit]
E.g. Alteryx, OpenAI, etc?…Practical Quantization in PyTorch
Quantization is a cheap and easy way to make your DNN run faster and with lower memory requirements. PyTorch offers a few different approaches to quantize your model. In this blog post, we’ll lay a (quick) foundation of quantization in deep learning, and then take a look at how each technique looks like in practice. Finally we’ll end with recommendations from the literature for using quantization in your workflows…
DuckDB tricks - renaming fields in a SELECT * across tables
I was exploring some new data, joining across multiple tables, and doing a simple SELECT * as I’d not worked out yet which columns I actually wanted. The issue was, the same field name existing in more than one table. This meant that in the results from the query, it wasn’t clear which field came from which table…
A socratic dialogue over the utility of DNA language models
I think I, alongside many other people in this field, live in this seemingly parallel universe where we don’t really understand why anyone is working on DNA language models. I say ‘parallel’, because there is obviously a world in which some very smart people are very much bullish about them: specifically the Arc Institute. Who, just yesterday, released a paper that many people are quite excited about: Evo 2, a successor to the original Evo model…To the avid fans of R, I respect your fight for it but honestly curious what keeps you motivated? [Reddit]
I started my career as an R user and loved it! Then after some years in I started looking for new roles and got the slap of reality that no one asks for R. Gradually made the switch to Python and never looked back. I have nothing against R and I still fend off unreasonable attacks on R by people who never used it calling it only good for adhoc academic analysis and bla bla. But, is it still worth fighting for?…How to remedy a badly calibrated machine learning model
Maybe you have a highly accurate model, but it's not calibrated, which means that you cannot use the predict_proba values for decision making. If that's the case we have some good news because there is a remedy in scikit-learn!…
.
Last Week's Newsletter's 3 Most Clicked Links
.
* Based on unique clicks.
** Find last week's issue #587 here.
Cutting Room Floor
.
Whenever you're ready, 2 ways we can help:
Looking to get a job? Check out our “Get A Data Science Job” Course
It is a comprehensive course that teaches you everything related to getting a data science job based on answers to thousands of emails from readers like you. The course has 3 sections: Section 1 covers how to get started, Section 2 covers how to assemble a portfolio to showcase your experience (even if you don’t have any), and Section 3 covers how to write your resume.Promote yourself/organization to ~66,600 subscribers by sponsoring this newsletter. 35-45% weekly open rate.
Thank you for joining us this week! :)
Stay Data Science-y!
All our best,
Hannah & Sebastian