Data Science Weekly - Issue 375
Issue #375 Jan 28 2021
Editor Picks
Scaling An ML Team (0–10 People)
So you’re doing an ML project! Maybe you want to build an object detection system for a robotics application or you want to add a recommender system to your webapp. You’ll need a team to build and improve this ML system. In the beginning, this can be a single (very stressed) engineer hacking together an MVP, but it can evolve into an entire department with highly specialized teams and hundreds of people. At each stage of developing a model pipeline, you will encounter different problems that require different team structures to overcome...
Induction, Inductive Biases, and Infusing Knowledge into Learned Representations
ur goal in building machine learning systems is, with rare exceptions, to create algorithms whose utility extends beyond the dataset in which they are trained. In other words, we desire intelligent systems that are capable of generalizing to future data. The process of leveraging observations to draw inferences about the unobserved is the principle of induction...
GuitarML
This post is going to focus on the recent work of leveraging Machine Learning to directly learn the audio processing characteristics of circuits. This stands in a modern context where digital modeling amplifiers like the Kemper, which rely on traditional Digital Signal Processing (DSP) techniques rather than Machine Learning, have seen mass adoption in the last several years. Here we will discuss what it means to digitally model an amplifier and how Machine Learning is beginning to make an impact. We'll start by discussing the basics of the problem from a Control Theory perspective and how DSP has approached the solution. We'll then present some work on how people are using Machine Learning to solve the problem as well as point to some open source projects so you can build your own ML-powered guitar circuit models...
A Message from this week's Sponsor:
Online Data Science Programs from Drexel University
Find your algorithm for success with an online data science degree from Drexel University. Gain essential skills in tool creation and development, data and text mining, trend identification, and data manipulation and summarization by using leading industry technology to apply to your career.
Learn more.
Data Science Articles & Videos
AI in Architecture: Is It a Good Match?
In this overview of AI in architecture, we'll look at emerging tools, systems and ideas related to AI in architecture — as well as some of the obstacles to automation...
Why your AI-based monitoring startup will fail
Please do not call me if you are selling a monitoring project that uses AI to do amazing things. Since 1991 I have been contacted by sales people from startups that use AI to make their monitoring system awesome. The first couple times I took their call and set up a demo. I was always disappointed. Now I just ignore these phone calls and emails...
Where Programming, Ops, AI, and the Cloud are Headed in 2021
In this report, we look at the data generated by the O’Reilly online learning platform to discern trends in the technology industry—trends technology leaders need to follow...
Naver AI Lab Researchers Relabel 1.28 Million ImageNet Training Images
A team of researchers from South Korea’s Naver AI Lab says they’ve found a computationally efficient re-labelling strategy that fixes a significant flaw in ImageNet...ImageNet’s popularity however doesn’t mean it’s perfect. The de-facto benchmark for the image classifiers also contains a significant level of label noise, and although many ImageNet samples contain multiple object classes, often only one of the present categories has been labelled...
The State of AI Ethics Report (Jan 2021)
To save you time and quickly get you up to speed on what happened in the past quarter, we’ve distilled the research & reporting around 8 key themes: 1) Algorithmic Injustice, 2) Discrimination, 3) Ethical AI, 4) Labor Impacts, 5) Misinformation, 6) Privacy, 7) Risk & Security, and 8) Social Media...
GENIE
GENIE is a leaderboard for natural language generation tasks. To provide more accurate assessment of progress, we use human evaluation of the entries, gathered dynamically using crowdsourcing (Amazon Mechanical Turk). Our leaderboard offers a unified approach to summarize the progress in text generation over a wide set of text generation tasks (question answering, summarization, machine translation and commonsense reasoning.)...
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity [YouTube Video Explainer]
Scale is the next frontier for AI. Google Brain uses sparsity and hard routing to massively increase a model's parameters, while keeping the FLOPs per forward pass constant. The Switch Transformer compares favorably to its dense counterparts in terms of speed and sample efficiency and breaks the next magic number: One Trillion Parameters...
What is AI? / Basic Questions with Professor John McCarthy
Q. What is artificial intelligence?...A. It is the science and engineering of making intelligent machines, especially intelligent computer programs. It is related to the similar task of using computers to understand human intelligence, but AI does not have to confine itself to methods that are biologically observable...Q. Yes, but what is intelligence?...
The Top 5 Data Trends for CDOs to Watch Out for in 2021
This year, we’ll see several new data trends: the emergence of new data roles and data quality frameworks, the rise of the modern data stack and modern metadata solutions, and the convergence of data lakes and warehouses...
Teaching AI to manipulate objects using visual demos
To teach a robot to place a bottle, we’d first have to tailor its reward so it learns to move the bottle upright over the table. Then we’d have to give it a separate reward focused on teaching it to put the bottle down. This is a slow and tedious iterative process that’s not conducive to real-world use, and, ultimately, we want to create AI systems that can learn in the real-world as efficiently as people can...As a step toward this goal, we’ve created (and open-sourced) a new technique that teaches robots to learn in this manner — from just a few visual demonstrations...
Training*
A Flexible Data Bootcamp Designed For You
TDI’s Spring Data Science Fellowship Program
Whether you’re looking to transition from academia into the business world, or you want to move up the data ladder and become a data scientist, The Data Incubator’s Fellowship program is the right option for you.
Available both full-time and part-time, you’ll work with live code, real-world data sets, live instructors and career experts.
Submit your application today for your chance to work with one of our hiring partners like:
FreddieMac
DSW
Foursquare
Genentech
And more
Apply Now.
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Jobs
Data Scientist - Apple Pay Analytics - NYC
You will play a key role improving the Apple Pay product experience. As a member of the analytics team you will be supporting a product function. You will partner with business owners, understand goals, craft KPIs and measure ongoing performance. You will initially engage with the product and engineering teams in ensuring that we have the appropriate instrumentation in place to deliver on these metrics. You will subsequently use advanced statistical, ML and analytical techniques to analyze product performance and identify key insights that inform product improvements and business strategy. The role requires a high degree of independence, ownership and collaboration working cross functionally across all levels of a highly matrixed organization...
Want to post a job here? Email us for details >> team@datascienceweekly.org
Training & Resources
Fundamentals of TinyML [Course, Free]
Focusing on the basics of machine learning and embedded systems, such as smartphones, this course will introduce you to the “language” of TinyML...
Architecture of machine learning systems [YouTube, Free]
A series how to turn machine learning models into production-ready software solutions...Episode 1: a) Why data science needs software architecture & b) Mastering requirements, Episode 2: a) Taking technical decisions & b) Best practices for decision making...Episode 3: a) Communication and documentation & b)Skills and personal development..
Course notes for MSDS621 Introduction to Machine Learning at Univ of San Francisco
This course introduces students to the key processes, models, and concepts of machine learning for tabular/structured data, such as: a) data cleaning, b) dealing with missing data, c) basic feature engineering, d) feature selection, e) model implementation, f) model training, g) model assessment, and h) mode interpretation...
Books
Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian