Data Science Weekly - Issue 384
Issue #384 Apr 01 2021
Editor Picks
Redefining what a map can be with new information and AI
Sixteen years ago, many of us held a printout of directions in one hand and the steering wheel in the other to get around— without information about the traffic along your route or details about when your favorite restaurant was open. Since then, we’ve been pushing the boundaries of what a map can do, propelled by the latest machine learning. This year, we’re on track to bring over 100 AI-powered improvements to Google Maps so you can get the most accurate, up-to-date information about the world, exactly when you need it. Here's a snapshot of how we're using AI to make Maps work better for you with a number of updates coming this year...
Stop Calling Everything AI, Machine-Learning Pioneer Says
While the science-fiction discussions about AI and super intelligence are fun, they are a distraction,” he says. “There’s not been enough focus on the real problem, which is building planetary-scale machine learning–based systems that actually work, deliver value to humans, and do not amplify inequities...
What Data Can’t Do
When it comes to people—and policy—numbers are both powerful and perilous...
A Message from this week's Sponsor:
Kickstart Your New Career with a Data Science & Analytics Bootcamp
Bootcamps are starting soon! Don’t miss your chance to join a Data Scientist-led, online Metis bootcamp with career support until you’re hired. Ready to take your data science or analytics career to the next level? Learn more about the Metis Online Data Science & Analytics Bootcamps.
Data Science Articles & Videos
Avoiding the 4 Major Pitfalls of Data Science Projects
There are countless reasons a project could fail due...selection bias, target leakage, data drift, p-hacking, overfitting and more...In this article, however, we’ll focus on some fundamental issues pertaining to collaboration with the business...
The Language Interpretability Tool (LIT) is an open-source platform for visualization and understanding of NLP models
The Language Interpretability Tool (LIT) is for researchers and practitioners looking to understand NLP model behavior through a visual, interactive, and extensible tool...Use LIT to ask and answer questions like: a) What kind of examples does my model perform poorly on?, b) Why did my model make this prediction? Can it attribute it to adversarial behavior, or undesirable priors from the training set?, c) Does my model behave consistently if I change things like textual style, verb tense, or pronoun gender?, and more...
The Chinese Approach to AI: An Analysis of Policy, Ethics, and Regulation
This paper explores current China’s current AI policies, their future plans, and ethical standards they’re working on. The authors zoom in on China’s country-wide strategic effort, i.e. the ‘New Generation Artificial Intelligence Development Plan’ (AIDP). The strategic aims of the plan can be divided up into 3 main goals: international competition, economic development, and social governance...
Announcing ICLR 2021 Outstanding Paper Awards
We are thrilled to announce the winners of the ICLR 2021 Outstanding Paper Awards. While there are 860 excellent papers in our program, and many of them of exceptional quality, we would like to highlight 8 papers that are especially noteworthy...Award winners (in alphabetical order) are...
If the number of machine learning PhD graduate is increasing rapidly, wouldn't it get exponentially harder to be hired at machine learning related jobs without PhD? [ Reddit Discussion ]
It seems everyone wants to do machine learning these days and those who did PhD in machine learning is increasing rapidly. Wouldn't it get harder and harder to be employed in machine learning related jobs without PhD?...
MADGRAD Optimization Method
We introduce MADGRAD, a novel optimization method in the family of AdaGrad adaptive gradient methods. MADGRAD shows excellent performance on deep learning optimization problems from multiple fields, including classification and image-to-image tasks in vision, and recurrent and bidirectionally-masked models in natural language processing. For each of these tasks, MADGRAD matches or outperforms both SGD and ADAM in test set performance, even on problems for which adaptive methods normally perform poorly...
How we use AutoML, Multi-task learning and Multi-tower models for Pinterest Ads
In this blog post, we explain how key technologies, such as AutoML, DNN, Multi-Task Learning, Multi-Tower models, and Model Calibration, allow for highly performant and scalable solutions as we build out the ads marketplace at Pinterest. We also discuss the basics of AutoML and how it’s used for Pinterest Ads...
Predicting your next query even before you type!
On the Flipkart website, when you click inside the Search box, you’ll see a list of auto-suggestions that help you plan the best query with less typing effort...We introduced the ‘Discover More’ feature just below the auto-suggestions list, which displays personalized queries based on user’s past shopping activities on Flipkart and some popular user queries across different categories...
How to Compare ML Experiment Tracking Tools to Fit Your Data Science Workflow
Tracking your experiments in an efficient and organized manner is crucial. Having to try and recreate a model from a couple of months ago that holds an important result for your project is a frustrating situation that can be avoided with some foresight. Having so many options of different experiment management tools and platforms that offer experiment tracking can be more a hindrance than a help. It can be hard to tell the difference between them and can keep us from making a decision and picking one. We hope that this post will help you to discover different experiment tracking tools and pick the one that fits your data science workflow best...
Cross-validation: what does it estimate and how well does it do it? [PDF]
Cross-validation is a widely-used technique to estimate prediction error, but its behavior is complex and not fully understood. Ideally, one would like to think that cross-validation estimates the prediction error for the model at hand, fit to the training data. We prove that this is not the case for the linear model fit by ordinary least squares; rather it estimates the average prediction error of models fit on other unseen training sets drawn from the same population...
Training*
Quick Question For You: Do you want a Data Science job?
After helping hundred of readers like you get Data Science jobs, we've distilled all the real-world-tested advice into a self-directed course.
The course is broken down into three guides:
Data Science Getting Started Guide. This guide shows you how to figure out the knowledge gaps that MUST be closed in order for you to become a data scientist quickly and effectively (as well as the ones you can ignore)
Data Science Project Portfolio Guide. This guide teaches you how to start, structure, and develop your data science portfolio with the right goals and direction so that you are a hiring manager's dream candidate
Data Science Resume Guide. This guide shows how to make your resume promote your best parts, what to leave out, how to tailor it to each job you want, as well as how to make your cover letter so good it can't be ignored!
Click here to learn more ...
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Jobs
Data Scientist - HelloFresh - Chicago, IL or New York, NY
Embedded in the NYC Tech Hub, we are building a cross-functional team of data scientists, analysts and engineers with the mission to bring the modeling and analytical capabilities of our marketing organization to the next level.
As a Data Scientist, you will support the analytic needs of our Growth organization comprising Technology, Digital Product and Marketing. You will play a pivotal role in helping us continue to succeed as the leading global meal kit provider. This role will solve challenging problems using vast repositories of customer data to provide detailed and actionable insights; core responsibilities include the development and automation of Marketing BI tools, predictive modeling, professional-grade dashboarding and reporting for some of our most critical initiatives and enhancing and facilitating the information extraction process...
Want to post a job here? Email us for details >> team@datascienceweekly.org
Training & Resources
Ask HN: What are the best data science bootcamps? [HN Discussion]
I'd like to move into data science by taking a ~3 month bootcamp. Which bootcamps are most respected by employers?...
Out of Distribution Generalization in Machine Learning
Machine learning has achieved tremendous success in a variety of domains in recent years. However, a lot of these success stories have been in places where the training and the testing distributions are extremely similar to each other. In everyday situations when models are tested in slightly different data than they were trained on, ML algorithms can fail spectacularly. This research attempts to formally define this problem, what sets of assumptions are reasonable to make in our data and what kind of guarantees we hope to obtain from them...
What will the major ML research trends be in the 2020s? [Reddit Discussion]
What do you think the next 10 years will bring in ML research? What conventionally accepted trend do you think will not happen?...e.g...a) Will deep learning continue to eat everything?, b) Will multi-task multi-domain learning make few-shot learning available for most domains? (Or is deep learning on the slow end of the sigmoid curve now?), c) Will safe, ethical, explainable AI rise, or is that hogwash?, d) Will advances decouple from compute power?, e) Will Gary Marcus and Judea Pearl win out in the symbolic/structural/causal war against deep learning?, f) Are there still major breakthroughs in language? Do we just finetune GPT-3?, g) Will we make big breakthroughs in theory and fundamental ML? Or is this the decade of application? (Healthcare will finally deploy models that beat logistic regression!)...
Books
Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian