Data Science Weekly - Issue 485

Curated news, articles and jobs related to Data Science.

Data Science Weekly

Mar 09, 2023

Issue #485
March 09 2023

Hello and Happy Thursday!

This is Hannah and Sebastian, curators of the Data Science Weekly newsletter.

Thank you for tuning in to Issue #485.

We appreciate your support :)

Once a week we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.

Hope you enjoy it.

If you find this useful to your work, please consider becoming paid subscriber here:
https://datascienceweekly.substack.com/subscribe . Subscribe and join the fun! :)

And now, let's dive into some interesting links from this week:

Editor's Picks

Robin Williams would like to have a word with you [Twitter thread]
So if I asked you about data you’d probably give me the skinny on every data book ever written. Kimball? You know a lot about him. Life’s work, dimensional modeling, star schemas, grains, the whole works, right? But I bet you can’t tell me what it smells like in a data center…

Things DBs Don't Do - But Should
Knowing about things you can do is obviously useful - it helps you do things. But what can you do with information about things databases tend to not do?…In this blog, I'll point out functionality that is very often needed in data platforms and more likely than not, you will need to build yourself since your DB won’t handle it for you. Even though it really should. I know you will need to build all of this yourself, because I've seen it in almost every project I was part of over 20+ years...

The State of Competitive Machine Learning in 2022
A review of competitive machine learning in 2022. We summarize the state of the competitive landscape and analyze the 200+ competitions that took place in 2022. Plus a deep dive analysis of 67 winning solutions to figure out the best strategies to win at competitive ML…

A Message from this week's Sponsor:

Pinecone vector database

The Pinecone vector database makes it easy to build high-performance vector search applications. Developer-friendly, fully managed, and easily scalable without infrastructure hassles.

Use Pinecone to build semantic search, object recognition, recommendations, anomaly detection, and other vector-based functionality into your applications.

Data Science Articles & Videos

Eduardo Blancas - Using embedded SQL engines for plotting massive datasets on a laptop [Video]
This talk will show you a simple yet effective technique to visualize larger-than-memory datasets on your laptop by leveraging SQLite or DuckDB. No need to spin up a Spark cluster!…

What Size Should Your Dashboard Be?
I downloaded some of my favorite business dashboards on Tableau Public to see what size they were...But what I saw was that these dashboards had a huge variance in size - they were all over the board. It was then that I added a question on both Twitter and LinkedIn asking the Tableau community, "what is your standard size for a dashboard?". I thought I'd see consistency in the answers, but no...not even close. Of course, being a data guy, I collected the responses. In about 5 hours, I had 50 responses so I decided to utilize those.…

The hot mess theory of AI misalignment: More intelligent agents behave less coherently
One popular AI risk centers on AGI misalignment. It posits that we will build a superintelligent, super-capable, AI, but that the AI's objectives will be misspecified and misaligned with human values…In this post, I experimentally probe the relationship between intelligence and coherence in animals, people, human organizations, and machine learning models. The results suggest that as entities become smarter, they tend to become less, rather than more, coherent. This suggests that superhuman pursuit of a misaligned goal is not a likely outcome of creating AGI…
I got a data engineering horror story, what is yours? [Reddit Discussion]
I don't know about you, but I have plenty of data engineering horror stories to share. I'd love to hear the one that still gives you shivers…Here's my highlight:…In our most important customer segment, it looks like it's starting to grow exponentially! Everyone is excited!…Suddenly an important manager calls me up…"hey, something is wrong with the north star…
Your guide to AI: March 2023
Welcome to the latest issue of your guide to AI, an editorialized newsletter covering key developments in AI research, industry, geopolitics and startups during February 2023…

How to avoid machine learning pitfalls: a guide for academic researchers
This document is a concise outline of some of the common mistakes that occur when using machine learning, and what can be done to avoid them…It covers five stages of the machine learning process: what to do before model building, how to reliably build models, how to robustly evaluate models, how to compare models fairly, and how to report results…
Online gradient descent written in SQL
Modern MLOps is complex because it involves too many components. You need a message bus, a stream processing engine, an API, a model store, a feature store, a monitoring service, etc…I believe MLOps shouldn’t be this complex…MLOps can be made simpler by bundling the logic into your database…In this post, I want to push this idea, and actually implement a machine learning algorithm within a relational database, using SQL…

Making Music in Microsoft Excel [Video]
Software Engineer: “I built you this spreadsheet software you can run your entire company’s financial processes off of”…User: “sounds great, check out what I built with it” [from @BrentBrewington]...

Leveraging ChatGPT for Call for Papers Submissions
There are four primary reasons why individuals don't submit talks using CFP: 1. I didn’t know about the submission dates. 2. I do not possess information worth sharing. 3. I fear speaking in public. 4. It's hard to write an attractive submission....this blog is all about the fourth reason. The good news is that you can leverage the power of machine learning via ChatGPT as a writing aid…

Causalvis: Visualizations for Causal Inference
In this paper, we address this gap with Causalvis, a Python visualization package for causal inference. Working closely with causal inference experts, we adopted an iterative design process to develop four interactive visualization modules to support causal inference analysis tasks. The modules are then presented back to the experts for feedback and evaluation. We found that Causalvis effectively supported the iterative causal inference process…

I’m a Machine Learning Engineer for FAANG companies. What are some places looking for freelance / contract work for ML? [Reddit Discussion]
Recently, I submitted a post here asking for advice on how to get started. Because of that helpful post, I have started getting clients for ML contract work, set up some basics, and I'm now asking directly: Is anyone here looking for ML contract work to be done or know of any resources to find such leads? My main ideas for outreach is to post on forums such as this one, but also through LinkedIn networks, through servers such as Slack and Discord, and other places. If anyone has other ideas on good ways to do outreach, please let me know…

How will Language Modelers like ChatGPT Affect Occupations and Industries?
In this paper we present a methodology to systematically assess the extent to which occupations, industries and geographies are exposed to advances in AI language modeling capabilities. We find that the top occupations exposed to language modeling include telemarketers and a variety of post-secondary teachers such as English language and literature, foreign language and literature, and history teachers. We find the top industries exposed to advances in language modeling are legal services and securities, commodities, and investments…

Jobs

Graphics Intern (Summer 2023), Scientific American

I’m happy to report that Scientific American is hiring a summer news graphics intern!

Note that the application deadline is March 10 [tomorrow].

Scientific American seeks applicants that have an interest in science, health and environmental journalism, and an ability to gather, analyze, and visualize data. The intern will be fully integrated into our editorial team and contribute to our award-winning coverage of science discoveries, science policy, public health, social science, technology, and insights and innovations that matter. (NYC/hybrid. Applicants must demonstrate right to work in the US.) Comfort with Adobe Illustrator is a must.

More details: https://careers.springernature.com/job/New-York-Graphics-Intern%2C-Scientific-American/902290101/

Want to post a job here? Email us for details --> team@datascienceweekly.org

Training & Resources

Online textbook "Practical Statistics in Medicine with R"
This online textbook is based on my notes from a series of lectures given for a few years at the Aristotle University of Thessaloniki. This textbook can be used as support material for lectures on basic statistics using R at any level from beginner to advanced. I have paid particular attention to the form of the book, which I think should aid understanding the most common statistical tests using base R and pipe-friendly functions, coherent with the ‘tidyverse’ design philosophy. It can also be used as a support for self-teaching…
The Annotated CLIP (Part-1)
Learning Transferable Visual Models From Natural Language Supervision…This post is part-1 of the two series blog posts on CLIP. In this blog, we present an Introduction to CLIP in an easy to digest manner. We also compare CLIP to other research papers and look at the background and inspiration behind CLIP…
What happens when we train the largest vision-language model and add in robot experiences?
PaLM-E: An Embodied Multimodal Language Model…We propose embodied language models to directly incorporate real-world continuous sensor modalities into language models and thereby establish the link between words and percepts. Input to our embodied language model are multi-modal sentences that interleave visual, continuous state estimation, and textual input encodings…

Last Week's Newsletter's 3 Most Clicked Links

Meaningful metrics: How data sharpened the focus of product teams

Data science maturity and the cloud

Feature Selection And Feature Importance: How Are They Related?

* Based on unique clicks.
** Find last week's issue #484 here.

Cutting Room Floor

Thanks for joining us this week :)

Hope you have an amazing weekend!

All our best,
Hannah & Sebastian

P.S., If you enjoyed reading this, please let us know by clicking the ❤️ button below. :)

P.P.S., Enjoy the newsletter? Please forward it to your friends and colleagues :)

Data Science Weekly Newsletter

Discussion about this post