Data Science Weekly - Issue 437
Issue #437 April 07 2022
Editor Picks
There’s more to data than distributions
Distribution shift is a fundamental challenge in machine learning, and whenever we attempt to deploy machine learning in the real world without considering the way in which that real world environment can change (whether its changes in technology (e.g., software vendors)), changes in population and setting (e.g., new demographics), or changes in behavior (e.g., new reimbursement incentives), then we fail to properly consider the ways in which the data can change or shift between train and test environments. If not considered, the model will inevitably fail....
What I learned from the Data Council 2022 data science conference
I attended the data science/data engineering conference Data Council 2022 this week. This is my first time attending this conference. One thing special about this conference is that there are a lot of startups and investors. There are several interesting themes we noticed...
σ-driven project management: when is the optimal time to give up?
Hi! It's your friendly project management theorician. You might remember me from blog posts such as "Why software projects take longer than you think", which is a blog post I wrote a long time ago positing that software projects completion time follow a log-normal distribution...My thesis for this blog post is that σ is an inherent property of the type of risk you have in your project portfolio, and that different values for σ warrants very different types of project management...
A Message from this week's Sponsor:
Free Course: Natural Language Processing (NLP) for Semantic Search
Learn how to build semantic search applications by making machines understand language as people do. This free course covers everything you need to build state-of-the-art language models, from machine translation to question-answering, and more. Brought to you by Pinecone. Start reading now.
Data Science Articles & Videos
Exploring Neural Networks Visually in the Browser
What I wanted for neural networks: A constrained, simplified environment for building basic network topologies and experimenting live to see visually how different layer counts, sizes, activation functions, hyperparameters, etc. impact their functionality and performance...With this goal in mind, I created a browser-based tool for building, training, visualizing, and experimenting with neural networks. Since it runs on the web, I've embedded it directly in this post...
Improving forecasting by learning quantile functions
Learning the complete quantile function, which maps probabilities to variable values, rather than building separate models for each quantile level, enables better optimization of resource trade-offs...
Can Computers Learn Common Sense?
A.I. researchers are making progress on a long-term goal: giving their programs the kind of knowledge we take for granted....
Tech Apprenticeships are opening everywhere! [Twitter Thread]
Tech Apprenticeships are opening everywhere! They are opportunities to pivot to a tech career whether you have a degree or not...🚀...Apprenticeships pay YOU to learn the skills while you work and gain mentorship from others...🥳...Here are some paying upwards of $3-6K per month.🧵...
Efficiently Initializing Reinforcement Learning With Prior Policies
A key challenge in Reinforcement Learning is learning policies from scratch in environments with complex tasks. Read how a meta-algorithm, Jump Start Reinforcement Learning, uses prior policies to create a learning curriculum that improves performance...
Andrew Ng: Unbiggen AI
Google Brain co-founder and Landing AI founder Andrew Ng has become an evangelist for what he calls the data-centric AI movement. “Collecting more data often helps," he says. "But if you try to collect more data for everything, that can be a very expensive activity.”...
Javis.jl - Julia Animations and Visualizations [Video]
Javis.jl is a general purpose animation library which builds on top of the Luxor.jl graphics library...It fills a gap in the Julia ecosystem by providing functionality to create object based animations to communicate complex ideas through simple means. Furthermore, Javis provides the flexibility for users to extend Javis’s visualizations to a variety of applications. Users are already expressing complicated ideas through winsome domain specific visuals such as planetary motion or brain mapping....
Everything gets a package? Yes, everything gets a package.
I recently read Ethan Rosenthal's Python Data Science setup, and I loved every line of it. His blog post encapsulates some of the best of the ideas that I wrote about in my Data Science Bootstrap knowledge base. In this blog post, I'd like to share my own data science setup and some new tooling that I've been working on to make it easy to achieve the same goals as I have...
Mushrooms communicate with each other using up to 50 ‘words’, scientist claims
Professor theorises electrical impulses sent by mycological organisms could be similar to human language...
Institute for AI & Fundamental Interactions Colloquium Series: A path towards human-level intelligence [Video]
Yann LeCun, VP and Chief AI Scientist, Meta talks about “A path towards human-level intelligence”...
Best of the visualisation web...December 2021
Since 2010 I have compiled and published monthly collections of links to some of the best, most interesting, or thought-provoking data visualisation-related content I come across. These collections are not always published immediately after the month in question has ended, but I try to do so as soon as my workload permits!...Here's a collection of some of the best content I encountered during December 2021...
Data Science Program*
Online Data Science Programs from Drexel University
Find your algorithm for success with an online data science degree from Drexel University. Gain essential skills in tool creation and development, data and text mining, trend identification, and data manipulation and summarization by using leading industry technology to apply to your career. Learn more.
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Jobs
Lead Data Engineer - electricityMap - Copenhagen, Denmark
The electricityMap team is hiring a data engineer to help us build and maintain a scalable data pipeline and database that forms the foundation of our mission to accelerate the energy system to a zero-carbon future.
In your role, you’ll be making sure the quality and availability of our data is stellar by building and improving our data infrastructure, as well as managing our internal tools. You will also be responsible for managing our machine learning pipelines at scale. We’re a small team, so you’ll be owning a lot of your own work and initiatives, but we will be there to support you!
Want to post a job here? Email us for details --> team@datascienceweekly.org
Training & Resources
How to Make Your Pandas Code Run Faster - Two Pandas Tricks I Wish I’d Known Earlier
Most of us data scientists, come to use Pandas library at some point of our work. This post is going to present 2 tricks that will make your pandas code run faster. The first is for removing None values, and the second regards extracting a set of values from a certain column...
Complete Step-by-step Particle Swarm Optimization Algorithm from Scratch
And its implementation for solving a nonlinear control theory problem...
K-Medoid Clustering (PAM) Algorithm in Python
A step-by-step tutorial—with a solved example...
Books
Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian