Data Science Weekly - Issue 533
Curated news, articles and jobs related to Data Science, AI, & Machine Learning
Issue #533
February 08, 2024
Hello!
Once a week, we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.
If you like what you read, consider becoming a paid member here: https://datascienceweekly.substack.com/subscribe :)
And now…let's dive into some interesting links from this week.
Editor's Picks
Mission Critical -- Satellite Data is a Distinct Modality in Machine Learning
This position paper argues that satellite data constitutes a distinct modality for machine learning research and that we must recognize it as such to advance the quality and impact of SatML research across theory, methods, and deployment. We outline critical discussion questions and actionable suggestions to transform SatML from merely an intriguing application area to a dedicated research discipline that helps move the needle on big challenges for machine learning and society…
How I Know Your Data Science/ML Project Will Fail Before You Even Begin
With a high probability, I can tell that your data science or machine learning project will fail—before you even begin! We’ve seen hundreds of data projects over the past 10+ years and distilled the patterns that correlate with success…Artificial and Biological Intelligence: Humans, Animals, and Machines
I believe a highly promising direction in AI research is to use artificial intelligence to better understand biological intelligence, and conversely, to use our understanding of biological intelligence to better understand how artificial intelligence works…
A Message from this week's Sponsor:
New Infrastructure to Build Knowledgeable AI
Learn how Pinecone's new serverless vector database helps Notion, Gong, and CS DISCO optimize their AI infrastructure from our VP of R&D, Ram Sriharsha:
Up to 50x lower costs because of the separation of reads, writes, and storage
O(s) fresh results with vector clustering over blob storage
Fast search without sacrificing recall powered by industry-first indexing and retrieval algorithms
Powerful performance with a multi-tenant compute layer
Zero configuration or ongoing management
Read the technical deep dive to understand how it was built and the unique considerations that needed to be made.
* Want to sponsor the newsletter? Email us for details --> team@datascienceweekly.org
Data Science Articles & Videos
The Mind-Boggling Reach of Super Bowl Commercials: A Statistical Analysis
This year's Super Bowl will attract an estimated 120 million viewers, making this program the second-most watched broadcast of all time—after the moon landing…There will be 70 big-budget commercials interspersed throughout the game, with each advertisement costing an absurd $7 million per 30-second timeslot. And yet, unlike all other nights when commercials are a hindrance, viewers will relish these ads, dazzled by the wonders of corporate marketing. So today, we'll explore The Super Bowl's ever-growing cultural dominance, America's odd infatuation with Super Bowl advertisements, and the absurd reach of these commercials…The Case for Open Source AI
Open source is indisputably one of the biggest drivers of progress in software and by extension AI. The field would be unrecognizable without it. However, it is under existential threat from regulation that will advantage entrenched interests. We believe that open AI is vital for research, innovation, competition, and safety. We must defend it vigorously…Gradient-based trajectory planning
how much i trust gradient descent. crazy, right? yes. i then succumbed to this temptation and looked for some simple example to test my trust in gradient descent. yes, i know that i should never doubt our lord Gradient Descent, but my belief is simply too weak. so, i decided to use gradient descent for simple trajectory planning given a 2D map…AMD CTO is here [Reddit]
Hey guys, I introduced Mark Papermaster to this subreddit today. He said he will check it out. He was very kind and nice. We are very much like the new Homebrew Computing club. What questions and requests do you have for Mark?…Great Tables - Absolutely Delightful Table-making in Python
With Great Tables anyone can make wonderful-looking tables in Python. The philosophy here is that we can construct a wide variety of useful tables by working with a cohesive set of table components. You can mix and match things like a header and footer, attach a stub (which contains row labels), arrange spanner labels over top of the column labels, and much more. Not only that, but you can format the cell values in a variety of awesome ways…SQL for the Weary in 100 queries
learning outcomesExplain the difference between a database and a database manager.
Write SQL to select, filter, sort, group, and aggregate data.
Define tables and insert, update, and delete records.
Describe different types of join and write queries that use them to combine data.
Use windowing functions to operate on adjacent rows.
Explain what transactions are and write queries that roll back when constraints are violated.
Explain what triggers are and write SQL to create them.
Manipulate JSON data using SQL.
Interact with a database using Python directly, from a Jupyter notebook, and via an ORM…
MLX Community Projects
Let's collect some cool MLX integrations and community lead projects here for visibility! If you have a project you would like to feature, leave a comment, and we will add it…Estimating Above Ground Biomass using Random Forest Regression in GEE
In this post, we will learn how to build a regression model in Google Earth Engine and use it to estimate total above-ground biomass using openly available Earth Observation datasets…The Many Ways to Deploy a Model
Over the past years, we have been helping companies deploy a wildly diverse set of ML workloads in production. Last year, we added open-source large language models (LLMs) in the mix. Continuing the line of research we started with NVIDIA, we recently collaborated with Hamel Husain, an LLM expert at Parlance Labs, to explore various popular solutions to model serving in general, LLM inference in particular. In this article, we share our decision rubric for model deployments using LLM inference as an example…
Anyone else’s company executives losing their shit over GenAI? [Reddit]
The company I work for (large company serving millions of end-users), appear to have completely lost their minds over GenAI. It started quite well. They were interested, I was in a good position as being able to advise them…However, now they are just trying to shoehorn gen AI wherever they can for the sake of the investors. They are not making rational decisions anymore. They aren't even asking me about it anymore. Some exec wakes up one day and has a crazy misguided idea about sticking gen AI somewhere and then asking junior (non DS) devs to build it without DS input. All the while, traditional ML is actually making the company money, projects are going well, but getting ignored. Does this sound familiar?…You Just Said Something Wrong About Logistic Regression
Congratulations, you just said something wrong about logistic regression. That’s OK, logistic regression is hard and we all have to learn/re-learn some things from time to time…This is a living blog post intended to address some common misconceptions or flat out wrong statements I’ve seen people make about logistic regression…LoRA From Scratch – Implement Low-Rank Adaptation for LLMs in PyTorch
LoRA, which stands for Low-Rank Adaptation, is a popular technique to finetune LLMs more efficiently. Instead of adjusting all the parameters of a deep neural network, LoRA focuses on updating only a small set of low-rank matrices. This Studio explains how LoRA works by coding it from scratch, which is an excellent exercise for looking under the hood of an algorithm…
Training & Resources
UW’s LING 575: NLP for Cultural Analytics
Surveys tools, frameworks, and skills needed to apply natural language processing methods to applications in the humanities and social sciences, with a focus on the analysis of large digital text corpora, including social media, literature, and historical documents. Topics will include data collection, text processing and machine learning techniques, data visualization, and ethical considerations…
CMU’s Advanced NLP Spring 2024
[Video lectures on YouTube here] CS11-711 Advanced Natural Language Processing is an introductory graduate-level course on natural language processing aimed at students who are interested in doing cutting-edge research in the field. In it, we describe fundamental tasks in natural language processing such as syntactic, semantic, and discourse analysis, as well as methods to solve these tasks. The course focuses on modern methods using neural networks and covers the basic modeling and learning algorithms required. The class culminates in a project in which students attempt to reimplement and improve upon a research paper in a topic of their choosing….The Math Behind the Adam Optimizer
You’ve likely heard about Adam, a name that has gained notable recognition in many winning Kaggle competitions. It’s common to experiment with a few optimizers like SGD, Adagrad, Adam, or AdamW, but truly understanding their mechanics is a different story. By the end of this post, you’ll be among the select few who not only know about Adam optimization but also understand how to leverage its power effectively…
Last Week's Newsletter's 3 Most Clicked Links
What distinguishes production-grade data pipelines from amateur setups?
Forecast Evaluation for Data Scientists: Common Pitfalls and Best Practices
* Based on unique clicks.
** Find last week's issue #532 here.
Whenever you're ready, 2 ways we can help:
Looking to get a job? Check out our “Get A Data Science Job” Course
A comprehensive course that teaches you everything related to getting a data science job based on answers to thousands of emails from readers like you. The course has 3 sections: Section 1 covers how to get started, Section 2 covers how to assemble a portfolio to showcase your experience (even if you don’t have any), and Section 3 covers how to write your resume.Promote yourself/organization to ~61,000 subscribers by sponsoring this newsletter. 35-45% weekly open rate.
Thank you for joining us this week! :)
Stay Data Science-y!
All our best,
Hannah & Sebastian
P.S. Pay us some money :) The membership program funds the free newsletter: https://datascienceweekly.substack.com/subscribe
P.P.S. “A SQL query walks into a bar, sees two tables, and asks, 'May I join you'?”
Copyright © 2013-2024 DataScienceWeekly.org, All rights reserved.