Issue #509
August 24 2023
Hello and thank you for tuning in to Issue #509!
Once a week we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.
Seeing this for the first time? Subscribe here:
If you find this newsletter helpful to your job, consider becoming a paid subscriber here: https://datascienceweekly.substack.com/subscribe :)
If you don’t find this email useful, please unsubscribe here.
And now, let's dive into some interesting links from this week :)
Editor's Picks
Structure and Interpretation of Computer Programs
38 years ago today, one of the most influential programming textbooks was published: MIT’s “Structure and Interpretation of Computer Programs.” Read it for free here…
AI Forecasting: Two Years In
Over the past two years, I and many other forecasters registered predictions about the state-of-the-art accuracy on ML benchmarks in 2022-2025. In this blog post, I evaluate the predictions for 2023…
Understanding Automatic Differentiation in 30 lines of Python
I'm a Machine Learning engineer and I use libraries like Tensorflow and Pytorch in my work to train my neural networks. And it's been a while since I wanted to write the simplest piece of code to perform what is called automatic differentiation which is at the heart of neural network training…In this article, I will try to iteratively build the simplest code to calculate derivatives automatically on scalars…
A Message from this week's Sponsor:
Hire AE’s World Class Tech Team
Accelerate your success with AE's elite team of experts!
🚀 Get ahead with swift development of Minimum Viable Products (MVPs).
🚀 Lead the way in innovation with Digital Transformation Initiatives.
🚀 Boost your ROI with tailored AI/ML solutions.
Schedule a Consultation Today
* Want to sponsor the newsletter? Email us for details --> team@datascienceweekly.org
Data Science Articles & Videos
The Race to Develop Artificial Intelligence That Can Identify Every Species on the Planet
Scientists are building machine-learning-powered software that can recognize a species based solely on a cellphone picture…
Create a roadmap for the next 5 years [Reddit Discussion]
My boss told me to draw up a roadmap for the next five years. My "team" consists of me, one person. I am responsible for everything related to data analytics (tech stack and current tasks: Azure databricks and some services, PBI, some ML/Predictive, rarely DL, some SQL, some Power Apps/Power Automate, maybe Excel, GIT/GitHub and so on...)…I'm a bit overwhelmed with this task creating a roadmap. How are you positioning yourself for the next five years? What should be considered?…
Why You (Probably) Don’t Need to Fine-tune an LLM
On their own, people often run into issues with base model LLMs — “the model didn’t return what I wanted” or “the model hallucinated, its answer makes no sense” or “the model doesn’t know anything about Y because it wasn’t trained on it”. People sometimes turn to a fairly involved technique called fine-tuning, in hopes that it will solve all of the above. In this post, we’ll talk about why fine-tuning is probably not necessary for your app…Modern Data Show Podcast Season 2, Episode 5: What's Fundamentally Wrong with Modern Data Stack with Lauren Balik, Owner at Upright Analytics
Lauren Balik, discusses why she believes the modern data stack is flawed and the three factors that affect the cost of a data platform. Balik also compares building versus buying a data platform and recommends an OLAP database in the cloud for small companies. However, she thinks centralizing data out of a line of business is a mistake for larger companies…Announcing Python in Excel: Next-Level Data Analysis for All
Today, Anaconda and Microsoft announced a groundbreaking innovation: Python in Excel. This marks a transformation in how Excel users and Python practitioners approach their work…Now you can write Python code directly in Microsoft Excel’s grid—no Python installation required…
I am a 10 YOE (SSIS/low-code) DE preparing to transition into tier 1 tech companies. Here's my study plan in case it helps someone else. [Reddit Discussion]
Everything is listed in order of importance. I'm breaking my prep down into:DS & Algorithms
System Design
Product Sense (for meta this is # 2 priority)
Data Modeling
ML Concepts
Cloud (AWS is the most commonly used)…
To Understand Transformers, Focus on Attention
If you can get this one thing, the rest will make sense…Throughout this tutorial I will highlight key ideas with the phrase “Attention is just…” to share different perspectives that have helped me understand. In most places below, the code is hidden by default (behind “> Show the code” expanders) because I don’t want you getting distracted by how I made certain figures; the point is the figure, not the code. This lesson is intended to be “coding-free”!…
Creating template files with R
If you find yourself regularly copying and pasting content between files, you can use R to do it for you! For repetitive tasks you can't fully automate, using template files is a great way to save time and this blog post will show you how to make them in R…
Software Engineering for Data Scientists by Catherine Nelson - Chapter 3
Chapter 3. Object-Oriented Programming and Functional Programming…I just added a new chapter to the online version of my new book, "Software Engineering for Data Scientists". It's about object oriented and functional programming in Python. This is a raw, unedited version, and all feedback is appreciated!…
Hire a data manager
A data manager is not in everyone's budget, but if you are able to make it work, they can be a great asset to your team!…“If you were struck by lightning, would other people be able to access and understand your data?” - Sarah Arena…While it may be feasible for a project coordinator or PI to take on data management tasks for a single project, as the number of projects grows within a lab, team members can be spread too thin and data management can suffer. Hiring someone to specifically focus on data management allows all team members to specialize and excel in their area of expertise…
What skills should I pick up to make more money at my next job? [Reddit Discussion]
Hey all, I have a "Senior Analyst" title at my company, but by day I masquerade as the sole analytics/data engineer for my company. I'm a one man data team managing the entire pipeline for about a 200-person company. Our tech stack for analytics consists of Fivetran and Stitch, dbt, Snowflake, and Looker.I make good money (145k) and the job is chill, but it's all too easy, and I'm getting bored…In your experience and opinion, what skills could I pick up from here to get biggest bump in pay when moving companies? How could I right now change up my company's tech stack and learn new tools and languages?…
Algebraic Topology for Data Scientists
I have three goals in writing this (free, pdf) book. The first is to bring people up to speed who are missing a lot of the necessary background. I will describe the topics in point-set topology, abstract algebra, and homology theory needed for a good understanding of TDA. The second is to explain TDA and some current applications and techniques. Finally, I would like to answer some questions about more advanced topics such as cohomology, homotopy, obstruction theory, and Steenrod squares, and what they can tell us about data. It is hoped that readers will acquire the tools to start to think about these topics and where they might fit in…
Jobs
GUCCI Global Senior Data Scientist
For the Gucci Global Data Science team based in Milan, we are currently seeking an English speaking Senior Data Scientist.
In this role, you will report to the Global Corporate Director of Data Science and help the business in central decision making processes, have the opportunity to lead the technical development of a small team of bright and driven data scientists, collaborate with teams across different regions and areas of the business leveraging Gucci’s rich data sources, infrastructure and the power of machine learning and advanced analytics.
Influential, innovative and progressive, Gucci is reinventing a wholly modern approach to fashion.
The Gucci Data Science team is the new kid on the block, bringing fresh perspectives and a new way of working that will help the company in continuing its innovation path leveraging the power of data and ML.
Apply here
Want to post a job here? Email us for details --> team@datascienceweekly.org
Training & Resources
Free, Online: R beginner workshop series - Part 1!
By popular demand, we are exited to bring you an Intro to R Workshop Series! 🥳 Sessions are every Wednesday from 6-8pm, beginning on Sept 13th!..RLadies Vancouver will host a 4-week Intro to R workshop series on manipulating, analyzing, and plotting data in R.This workshop is intended for anyone who:
has no prior coding experience in R or other languages.
has little self-taught coding experience.
is proficient in other languages, such as Python, and would like to transition into R…
CS388: Natural Language Processing (online MS version)
📣 Today we launched an overhauled NLP course to 600 students in the online MS programs at UT Austin. 98 YouTube videos 🎥 + readings 📖 open to all!…w/5 hours of new 🎥 on LLMs, RLHF, chain-of-thought, etc!..Stanford XCS224U: Natural Language Understanding (2023)
Covers topics such as contextual word representations, information retrieval, in-context learning, behavioral evaluation of NLU models, NLP methods and metrics, and much more…
Last Week's Newsletter's 3 Most Clicked Links
Analysis of the data job market using "Ask HN: Who is hiring?" posts
Failed an interviewee because they wouldn't shut up about LLMs at the end of the interview
* Based on unique clicks.
** Find last week's issue #508 here.
Cutting Room Floor
Satellite Cloud Generator: A PyTorch-based tool to generate clouds for satellite images
Simulation-Based Prior Knowledge Elicitation for Parametric Bayesian Models
Introducing Code Llama, a state-of-the-art large language model for coding
Consciousness in Artificial Intelligence: Insights from the Science of Consciousness
Thank you for joining us this week :)
All our best,
Hannah & Sebastian
P.S.
If you found this newsletter helpful to your job, please consider becoming a paid subscriber here: https://datascienceweekly.substack.com/subscribe :)
Copyright © 2013-2023 DataScienceWeekly.org, All rights reserved.