Data Science Weekly - Issue 458
Issue #458 September 01 2022
Editor's Picks
Rules of Machine Learning: Best Practices for ML Engineering [PDF]
This document is intended to help those with a basic knowledge of machine learning get the benefit of best practices in machine learning from around Google. It presents a style for machine learning, similar to the Google C++ Style Guide and other popular guides to practical programming. If you have taken a class in machine learning, or built or worked on a machine learned model, then you have the necessary background to read this document...
Why are deep learning technologists so overconfident?
Deep learning researchers have been predicting for a while that the technology will make various professions obsolete and that self-driving cars are imminent. We’re still waiting. Some have even claimed that they are nearing artificial general intelligence, or AI capable of equalling or exceeding human performance at all tasks...Hype is nothing new to machine learning, but this wave seems different. Billions of dollars in funding have been allocated based on this hype, and it has led to a massive amount of public confusion...
AI-Generated Bible Art
All these images were generated by DALL·E 2, an AI that converts text to images. Hover your mouse over the image to see the prompt that generated it, and click to see a full-size version. These images mostly reflect stories from Genesis and Luke; my purpose is to explore a wide variety of genres and styles in these stories...
A Message from this week's Sponsor:
AI, BI, and Data Leaders: Dive Deep Into the Semantic Layer in a One-Day Virtual Summit
Our Semantic Layer is what makes data discoverable and usable - if it’s designed correctly. Join Snowplow, Databricks, AtScale, and 30+ top industry technologists to learn best practices and discuss the latest developments in semantic layers for enterprise data.
Free registration closes soon. Save your spot at the Semantic Layer Summit 2022 (virtual)
Data Science Articles & Videos
A Proposal For the Dartmouth Summer Research Project On Artificial Intelligence [PDF]
The proposal for the 1956 Dartmouth AI summer workshop is dated Aug 31, 1955, exactly sixty seven years ago today. The oldest recorded use of the term "Artificial Intelligence"...
A Short Chronology Of Deep Learning For Tabular Data
Deep Learning For Tabular Data In Reverse Chronological Order...Below is a (growing) list of relevant papers along with links and short summaries. I will try to keep this list up to date as new publications arrive...
Understanding Diffusion Models: A Unified Perspective
Diffusion models have shown incredible capabilities as generative models; indeed, they power the current state-of-the-art models on text-conditioned image generation such as Imagen and DALL-E 2. In this work we review, demystify, and unify the understanding of diffusion models across both variational and score-based perspectives...
What are some dead ideas in machine learning or machine learning textbooks? [Reddit Discussion]
Every now and then I flip one of those books on ML from the 80s and see a bunch of algorithms or models such as Adaline, Helmholtz and Boltzmann machine, and wonder why virtually nobody talks about them anymore...Can someone in this field point out some algorithms/ideas that are basically dead or abandoned at this point?...
Melanie Mitchell: Seemingly ‘sentient’ AI needs a human in the loop
Language-generation models are advancing rapidly but there is a big difference between competence and comprehension, says Santa Fe Institute professor...In this wide-ranging discussion with the FT’s west coast editor, Richard Waters, she explains the potential and limits of the latest AI — as well as the technical and social challenges that lie ahead to ensure the technology will be genuinely beneficial...
Papers with Code: Top Trending ML Papers of the Month - August [Twitter Thread]
Here is a thread to catchup on the top 10 trending papers of August on @paperswithcode...
Using AI to decode speech from brain activity
We’ve developed an AI model that can decode speech from noninvasive recordings of brain activity...From three seconds of brain activity, our results show that our model can decode the corresponding speech segments with up to 73 percent top-10 accuracy from a vocabulary of 793 words, i.e., a large portion of the words we typically use on a day-to-day basis...
What does GPT-3 “know” about me?
Large language models are trained on troves of personal data hoovered from the internet. So I wanted to know: What does it have on me?...
AI/ML safety research in August [Twitter Thread]
What happened this month in AI/ML safety research. 🧵...
Big problem with companies now is they hire data scientist for task that don't require data science practices [Reddit Discussion]
I know everyone wanted to jump on the data science wagon and every big company invested heavily in data science departments. However many roles may list the title as Data Scientist or something data science related, but the position still falls under the realm of analytics...
NormConf is the tech conference about all the stuff that matters in data + ML, but doesn't get the spotlight
[REGISTER NOW!] What if there were a conference about all the mundane, behind-the-scenes, how-the-sausage-is-made, middlebrow, unsexy, normcore stuff in the data and ML parts of the tech world? We thought there should be!...
What sub-areas of AI will be a big deal in the future and why? [Reddit Discussion]
self learning with biological process...NLP...optimization theory...Continual Learning...Cognitive Reasoning...AI with VR...
Conference*
Join thousands of data humans like you to rethink how the world does data
Coalesce, hosted by dbt Labs, is the annual conference dedicated to advancing the practice of analytics engineering. This year the conference will take place October 17-21 with in-person and no-cost remote options to accommodate attendees in all timezones.
The event offers multiple opportunities to learn directly from the team who’s shaped an entire industry, meet and network with the dbt community and attend valuable educational sessions to gain dbt knowledge and implementation practices. dbt training and certification is also offered.
Coalesce prioritizes talks that present unique solutions to real data problems—as seen through the lens of the practitioners that face them every day. Group discounts are available by writing coalesce@getdbt.com. To learn more, view the agenda, speakers, and register here!
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Jobs
Data Scientist - Success Academy Charter Schools, Inc - NYC
This new Data Scientist role will be a key contributor to our mission of driving innovation across the organization. Reporting to the Leader of Enterprise Analytics, this role will be responsible for working with stakeholders in various functions to understand areas of opportunity, developing analytical solutions ranging from dashboards to sophisticated mathematical models, and helping functional teams adopt those solutions. This role will be part of a highly collaborative team of professionals with a wide range of skills including data science, data engineering, business analysis, and project management....
Want to post a job here? Email us for details --> team@datascienceweekly.org
Training & Resources
Causal Inference Bootcamp
Between 2013 and 2015, I worked with Jim Speckart and the Social Science Research Institute (SSRI) at Duke to create a series of videos on causal inference. These are nontechnical explanations of the basic methods social scientists use to learn about causality. They're aimed at high school seniors or 1st year undergraduates, and are quite short---around 2 to 5 minutes on average...
The Data Science Interview Book
The Data Science Interview book has been making steady progress over the months...It the last 1 year it has been used by readers of more than 90 countries. Be sure to check it out...
Topic Detection in Podcast Episodes with Python
I’ve been fascinated by Machine Learning and Topic Detection with Python lately, so I wrote a short blog post about it. In the post, I’m transcribing a podcast using a speech-to-text Python SDK and deriving a list of topics from it to quickly discover, well, the topic of it! 😄...
What you’re up to – notes from DSW readers
You and your work could be featured here :)
* To share your projects and updates, share the details here.
** Want to chat with one of the above people? Hit reply and let us know :)
Last Week's Newsletter's 3 Most Clicked Links
* Based on unique clicks.
** Find last week's newsletter here.
P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian