Data Science Weekly - Issue 491

Curated news, articles and jobs related to Data Science

Apr 21, 2023

Issue #491
April 20 2023

Hello and thank you for tuning in to Issue #491.

Once a week we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.

Seeing this for the first time? Subscribe here:

If this is useful for your work, please consider becoming paid subscriber here:
https://datascienceweekly.substack.com/subscribe

If you don’t find this useful, unsubscribe here.

Hope you enjoy it!

And now, let's dive into some interesting links from this week:

Editor's Picks

Choose Your Weapon: Survival Strategies for Depressed AI Academics
A growing number of AI academics can no longer find the means and resources to compete at a global scale. This is a somewhat recent phenomenon, but an accelerating one, with private actors investing enormous compute resources into cutting edge AI research. Here, we discuss what you can do to stay competitive while remaining an academic. We also briefly discuss what universities and the private sector could do improve the situation, if they are so inclined. This is not an exhaustive list of strategies, and you may not agree with all of them, but it serves to start a discussion…

As a developer, the current rate of ML progress doesn't make any sense to me [Reddit Discussion]
When I take a look at the progress in ML, it seems to go like this:
- Paper gets released. A week later everyone is already an expert on the topic. They have 100% understood it, implemented the method, become proficient at using it and identified the issues
- A few days later a completely new idea that improves upon the paper has already been implemented, tested, and an entire paper has been written and released
How does this make sense to any of you? 99.99% of developers I know would need months to become good enough at using a tool in any professional capacity…

Please Stop Drawing Neural Networks Wrong: The case for GOOD diagrams
Aaron Master and Doron Bergman argue that most drawings of neural networks are "confusing, incomplete, and probably wrong." Instead, they propose creating "GOOD" (Generally Objective Observable Depiction) diagrams, and show what these would look like…

A Message from this week's Sponsor:

Track every customer interaction in real-time and gain a deep understanding of your customers’ behavior

Track every customer interaction in real-time and gain a deep understanding of your customers’ behavior

Segment Unify allows you to unite online and offline customer data in real-time across every platform and channel. Use Segment Profiles Sync to send identity resolved customer profiles to your data warehouse, where they can be used for advanced analytics and enhanced with valuable data-at-rest. Then use Segment Reverse ETL to immediately activate your ‘golden’ profiles across your CX tools of choice.

Want to sponsor the newsletter? Email us for details --> team@datascienceweekly.org

Data Science Articles & Videos

Building a Tree-Structured Parzen Estimator from Scratch (Kind Of)
An alternative to traditional hyperparameter tuning methods…Finding the best set of hyperparameters (often called “tuning”) is one of the most important and time consuming parts of the modeling task…In this article we will tackle a relatively new approach for hyperparameter tuning — the Tree-Structured Parzen Estimator (TPE) — and understand its function programmatically through a step-by-step implementation in Python…

Every tree counts: Large-scale mapping of canopy height at the resolution of individual trees
In this article, we describe how we leveraged internal state-of-the-art AI technology and collaborated with the World Resources Institute (WRI) to develop a method to map forests, tree by tree, across areas the size of continents. As an example, we mapped the U.S. state of California and São Paulo, Brazil, and are making the data public and freely available…

Any American data scientists or ML engineers here ever work abroad? What was the data science scene and job market like in your new host country? [Reddit Discussion]
I am curious to ask any US data scientists or MLEs here worked abroad before…Where did you move to, and what was the data science scene like there? Was there a good job market for where you lived? And did you end up going back to the US, or did you decide to stay? Thanks!…
The Anatomy of Autonomy: Why Agents are the next AI Killer App after ChatGPT
Auto-GPT/BabyAGI Executive Summary, a Brief History of Autonomous Agentic AI, and Predictions for Autonomous Future…
The landscape of biomedical research
We visualized the entire PubMed library, 21 million biomedical and life science papers, and…we present a 2D atlas of the entire corpus of biomedical literature, and argue that it provides a unique and useful overview of the life sciences research…Furthermore, we present an interactive web version of our atlas that allows easy exploration and will enable further insights and facilitate future research…

Here why I think you should get into AI right now
If you are interested in it, here why I think you should get into AI right now….Disclaimer: If you are looking for some AI hype thread, go somewhere else…TL;DR when a new thing comes along, there is a window of time where nobody is an expert. There are only people interested in it, playing around with it, and talking with each other. But eventually the thing matures and the window closes. After barriers to entry are much higher…
Convenient Bayesian Marketing Mix Modeling with PyMC Marketing
You can tell the importance of a topic by how many big companies are releasing software packages on it. In the field of marketing mix modeling, Google released LMMM, Meta released Robyn, PyMC Labs released PyMC Marketing (and I released mamimo 😇)…Even better than marketing mix modeling is Bayesian marketing mix modeling, which Google’s and PyMC Labs’ libraries provide. While LMMM is certainly interesting as well, today, we will focus on PyMC Marketing. In this article, you will learn how easy it is to build a state-of-the-art Bayesian marketing mix model nowadays!…

Building Machine Learning Applications that Empower Policymakers with Insights to Support Vulnerable Communities
A conversation about the nuances of applying machine learning algorithms to Earth observation for global development organizations…

AI / ML / LLM / Transformer Models Timeline
This is a collection of important papers in the area of Large Language Models and Transformer Models. It focuses on recent development, especially from mid-2022 onwards, and in no way claims to be exhaustive. It is actively updated and the graph is clickable!…

Solving brain dynamics gives rise to flexible machine-learning models
MIT CSAIL researchers solve a differential equation behind the interaction of two neurons through synapses to unlock a new type of speedy and efficient AI algorithm…

The one about AI
Optimists and pessimists agree that AI will change the world. If it goes wrong, AI May gain sentience and destroy us all. Or it goes well, and it gives us superpowers…there’ll be winners and losers – everyone agrees…I suspect we’ll get part of both futures. AI will be integrated into a lot of things and become like the bayesian spam filters that now seem obvious and simple. It’ll be implemented in places it doesn’t belong, and cause havoc. Jobs will shift, some becoming more in demand and others less. Enough context, let’s talk about history and vibes and happiness…AI feels like a reshuffling…

kNN vs. SVM
Note on k-Nearest Neighbor lookups on embeddings: in my experience much better results can be obtained by training SVMs instead. Not too widely known…Short example…

Jobs

Machine Learning Researcher @ Figma

We’re looking for engineers with a background in machine learning & artificial intelligence to improve our products and build new capabilities. You'll be driving fundamental and applied research in this area. You’ll be combining industry best practices and a first-principles approach to design and build ML models and infrastructure that will improve Figma’s design and collaboration tool.

What you’ll do at Figma:

Drive fundamental and applied research in ML/AI, with Figma product use cases in mind
Formulate and implement new modeling approaches both to improve the effectiveness of Figma’s current models as well as enable the launch of entirely new AI-powered product features
Work in concert with other ML researchers, as well as product and infrastructure engineers to productionize new models and systems to power features in Figma’s design and collaboration tool
Explore the boundaries of what is possible with the current technology set and experiment with novel ideas.

Apply here

Want to post a job here? Email us for details --> team@datascienceweekly.org

Training & Resources

Want to learn about meta-learning & few-shot learning?
All of the latest lecture videos for Stanford CS330 are now online!…New topics include: - self-supervised pre-training - large scale meta-optimization - domain adaptation & generalization…
Tidy-finance-with-R (python version)
There are codes teanslated from the book named Tidy finance with R to python which you can get from https://www.tidy-finance.org/…The material here consists of Python code translated from the book...Most of my code reproduces in Python the R code from this amazing book, but some of my code maybe not be that 'tidy'…
How to Run Surveys: A guide to creating your own identifying variation and revealing the invisible [PDF]
Surveys are an essential approach for eliciting otherwise invisible factors such as perceptions, knowledge and beliefs, attitudes, and reasoning. These factors are critical determinants of social, economic, and political outcomes. Surveys are not merely a research tool. They are also not only a way of collecting data. Instead, they involve creating the process that will generate the data…This paper offers guidance on the complete survey process, from the design of the questions and experiments to the recruitment of respondents and the collection of data to the analysis of survey responses. It covers issues related to the sampling process, selection and attrition, attention and carelessness, survey question design and measurement, response biases, and survey experiments…

Last Week's Newsletter's 3 Most Clicked Links

Building LLM applications for production

Time-Series Forecasting: Deep Learning vs Statistics — Who Wins?

SQL Is All You Need

* Based on unique clicks.
** Find last week's issue #490 here.

Cutting Room Floor

Thanks for joining us this week :)

All our best,
Hannah & Sebastian

P.S.,
Please consider becoming paid subscriber here: https://datascienceweekly.substack.com/subscribe

:)

Data Science Weekly Newsletter

Discussion about this post