Issue #486
March 16 2023
Hello and thank you for tuning in to Issue #486.
This is Hannah and Sebastian, curators of the Data Science Weekly newsletter.
We appreciate your support :)
Once a week we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.
If you find this useful, please consider becoming paid subscriber here:
https://datascienceweekly.substack.com/subscribe
Hope you enjoy it.
.
And now, let's dive into some interesting links from this week:
Editor's Picks
Having an existential crisis, need some motivation [Reddit Discussion]
With the recent exponential growth in DL (GPT4, Palm-e, llama, stable diffusion etc) it just seems impossible to catch up. Also I read somewhere that with the current rate of progress, AGI is only few years away (maybe in 2030s), and it feels like once AGI is achieved it will all be over and here I am still wrapping my head around back propagation in a jupyter notebook running on a sh*t laptop gpu, it just feels pointless…Maybe this is dumb, anyway I would love to hear what you guys have to say. Some words of motivation will be helpful :) Thanks…
Here’s What It Would Take To Slow or Stop AI
Probably the most common genre of bad AI take is, “maybe we should press pause on AI development until we figure all this out.” There are two types of people who float this idea:Those who have no clue what they’re actually asking for, because they either don’t know enough about AI or they don’t know enough about the way the world works outside of computers.
Those who know exactly what their suggestion implies, but think it’s preferable to whatever AGI doom scenario they envision as the alternative.
I’ve written this post for people who, for whatever reason of ignorance about machine learning or cluelessness about offline reality, find themselves in the first camp, above…As for the people in the second camp, I respect them because they already know everything I’m about to say but are forging ahead anyway…
New 3Blue1Brown Video: But what is the Central Limit Theorem?
A visual introduction to probability's most important theorem…
A Message from this week's Sponsor:
Check out the results of the “MLOps is more than just tools” survey among ML practitioners
TheSequence partnered with Toloka to explore what MLOps culture looks like across the industry at the start of 2023. A huge variety of tools are available for ML development, but the culture and practices still have some catching up to do. TheSequence asked their community of over 155,000 data scientists, ML engineers and AI enthusiasts to share their thoughts about it. Toloka helped to bring in even more insights by promoting the survey in other top newsletters. Finally, TheSequence summarized the results of the survey and prepared the report we are excited to share here.
Want to sponsor the newsletter? Email us for details --> team@datascienceweekly.org
Data Science Articles & Videos
Exploring UK Skill Demands
There is no publicly available data on the skills mentioned within UK job advertisements…this data gap means that these groups have a less-than-complete evidence base on which to inform labour market policies, address regional skill shortages or advise job seekers…To address this gap, Nesta has created the Open Jobs Observatory (OJO), a project to: a) collect millions of online job adverts (with the permission of online job boards); and b) develop a suite of algorithms that extract insights from the text of the job advertisements…
Anyone else witnessing a panic inside NLP orgs of big tech companies? [Reddit Discussion]
I'm in a big tech company working along side a science team for a product you've all probably used. We have these year long initiatives to productionalize "state of the art NLP models" that are now completely obsolete in the face of GPT-4. I think at first the science orgs were quiet/in denial. But now it's very obvious we are basically working on worthless technology. And by "we", I mean a large organization with scores of teams. Anyone else seeing this? What is the long term effect on science careers that get disrupted like this?…
dbt_linreg: Linear regression in SQL using dbt
dbt_linreg is an easy way to perform linear regression and ridge regression in SQL (Snowflake, DuckDB, and more) with OLS using dbt's Jinja2 templating…Data Science or Data Engineer?
I am currently a Data Scientist at a small company. I am considering a switch to data engineering. How can I determine if this is the right career path for me?…The Multi-modal, Multi-model, Multi-everything Future of AGI
There are lots of screenshots and bad takes flying around, so I figure it would be most useful to an executive-summary-style recap for GPT-4…
TalkRL: The Reinforcement Learning Podcast - Natasha Jaques
Hear about why OpenAI cites her work in RLHF and dialog models, approaches to rewards in RLHF, ChatGPT, Industry vs Academia, PsiPhi-Learning, AGI and more! Dr Natasha Jaques is a Senior Research Scientist at Google Brain…Snip a picture and it turns into a datatable in MS Excel... [Twitter Thread]
Analyzing data will never be so easy…
Tencent Data Engineer: Why We Went from ClickHouse to Apache Doris?
This article is co-written by me and my colleague Kai Dai. We are both data platform engineers at Tencent Music (NYSE: TME), a music streaming service provider with a whopping 800 million monthly active users. To drop the number here is not to brag but to give a hint of the sea of data that my poor coworkers and I have to deal with everyday…
Exploring the TPC-DS Benchmark Queries with Malloy
I’ve been writing a lot recently about Malloy, an experimental analytical query language built by members of Looker’s founding team. Upon reading the overview materials and documentation, it sounded like exactly what I was hoping for, but in order to develop a more informed opinion, I needed to get some experience with actually writing the language. Hence, I decided to embark on a project to translate each of the 99 TPC-DS benchmark SQL queries to Malloy. This post will give an overview of the TPC-DS dataset, the queries, and the opinions I've formed about Malloy along the way…
The AI Multimodal Revolution with Junnan Li and Dongxu Li of BLIP & BLIP2 [Video]
As recently as January 2021, the challenge of "interpreting what is going on in a photograph" was considered "nowhere near solved." Today's guests Junnan Li and Dongxu Li changed that with their publication and open-sourcing of BLIP, which delivered state-of-the-art performance on image captioning and other vision-language tasks. BLIP became the #18 most-cited AI paper of 2022, and now Junnan and Dongxu are back with BLIP-2, this time showing how small models can harness the power of existing foundation models to do multi-modal tasks…
What does Skew mean?
In the normal use of the word skew, we might say that [the above] distribution “skews to the left”…But according to statisticians, that would be wrong, because within the field of statistics, skew has been given a technical meaning that is contrary to its normal use…
Datacast episode 110: wisdom in building data infrastructure, lessons from open-source development, the missing readme, and the future of data engineering with Chris Riccomini
Our wide-ranging conversation touches on his 15+-year experience working on infrastructure as an engineer and manager at PayPal, LinkedIn, and WePay; his involvement in open source as the original author of Apache Samza and an early contributor to Apache Airflow; tactical advice on conducting technical interviews, building internal data infrastructure, and writing a technical book; his experience investing and advising startups in the data space; the future of data engineering; and much more…
Jobs
Software Developer Job Opportunity at Observable, Inc
SALARY AND HOURS: $107,640 - $150,000 per year; 40 hours per week.
EXPERIENCE AND REQUIREMENTS: Bachelors degree in Computer Science.
DESCRIPTION OF DUTIES:
Design, develop, test, deploy, maintain and improve software
Write code for Observable’s product and platform, create reliable and sustainable systems, and develop prototypes quickly
Write unit and integration tests to ensure the software is functioning correctly and securely
Deploy and release software at a regular cadence
Support and improve the software through on call and support tasks
Communicate and interact with users to understand their requirements and respond to their issues.
Collaborate on projects with designers, engineers and product managers.
Apply here
Want to post a job here? Email us for details --> team@datascienceweekly.org
Training & Resources
Visualizing Pretty Building Footprints With QGIS: Using Microsoft Building Footprints to Create Abstract City Maps
I made some maps the other day just to expore the Microsoft Building Footprints dataset and they turned out pretty nice. Here’s a short rundown of how you can do this too…Waterloo’s CS 886: Graph Neural Networks
This seminar will cover seminal work in the space of graph neural networks. For example, spectral and spatial convolutional graph neural networks, graph attention networks, invariant and equivariant graph neural networks, generall message passing graph neural networks. We will focus on both practical and theoretical aspects of graph neural networks. Practical aspects include, scalability and performance on real data. Examples of theoretical questions include: what does convolution do to the input data? Does convolution improve generalization compared to not using a graph? How do multiple convolutions change the data and how do they affect generalization?…Geographic Data Science with R: Visualizing and Analyzing Environmental Change
Geographic Data Science with R (GDSWR) provides a series of tutorials aimed at teaching good practices for using time series and geospatial data to address topics related to environmental change It is based on the R language and environment, which currently provides the best option for working with diverse sources of spatial and non-spatial data using a single platform…
Last Week's Newsletter's 3 Most Clicked Links
* Based on unique clicks.
** Find last week's issue #485 here.
Cutting Room Floor
Changing career at age 62… is it even possible? [Reddit Discussion]
Our community must get serious about opposing OpenAI [Reddit Discussion]
New US Legislation to Boost Data Science in K-12 and Higher Ed
Thanks for joining us this week :)
Hope you have an amazing weekend!
All our best,
Hannah & Sebastian
P.S., If you enjoyed reading this, please let us know by clicking the ❤️ button below. :)
Copyright © 2013-2023 DataScienceWeekly.org, All rights reserved.