Data Science Weekly - Issue 350
Issue #350 Aug 6 2020
Editor Picks
Can GPT-3 Make Analogies?
Many articles and social media posts have given examples of GPT-3’s extraordinarily human-like text, its seemingly endless knowledge of (mostly Western) culture, and even its ability to create computer programs just by being given a few input-output examples. My purpose in this article is not to review the success, hype, or counter-hype on GPT-3. Instead, I want to explore its ability to make Copycat letter-string analogies...
NeRF in the Wild
Neural Radiance Fields for Unconstrained Photo Collections
NeRF-W is a method for reconstructing 3D scenes from internet photography. We apply it to the kinds of photos you might take on vacation: tourists, poor lighting, filters, and all...
Why You Should Do NLP Beyond English
7000+ languages are spoken around the world but NLP research has mostly focused on English. This post outlines why you should work on languages other than English...
A Message from this week's Sponsor:
Exclusive deal for Data Science Weekly readers
With authors including the creator of Keras and Google Cloud AI engineers, you can be sure that when you’re learning from Manning, you’re learning from the very best.
Data Science Articles & Videos
TikTok and the Sorting Hat
I’ve been fascinated with TikTok. Here in 2020, TikTok is, for many, including myself, the most entertaining short video app going. The U.S. government is considering banning the app as a national security risk, and while that’s the topic du jour for just about everyone right now, I’m much more interested in tracing how it got a foothold in markets outside of China, especially the U.S. with its powerful incumbents.... The answer, I believe, has significant implications for the future of cross-border tech competition, as well as for understanding how product developers achieve product-market-fit. The rise of TikTok updated my thinking. It turns out that in some categories, a machine learning algorithm significantly responsive and accurate can pierce the veil of cultural ignorance...
How to Think Like an Epidemiologist
Don’t worry, a little Bayesian analysis won’t hurt you...
Microsoft researchers claim ‘state-of-the-art’ biomedical NLP model
Domain-specific pretraining of language models helps, at least in highly specific domains such as biomedical...
How graph technologies are being used to solve complex business problems
In this episode of the Data Exchange I speak with Denise Gosnell, Chief Data Officer at DataStax1. This conversation is a great introduction to what has become an important class of technologies and tools. Graph technologies are used to power a wide array of applications, including recommendation engines, fraud detection systems, identity and access management, search, and many other use cases...
How We’re Solving Data Discovery Challenges at Shopify
Indexing your data sets and documenting it is another example of a problem lots of companies have but there’s no great tool for it, so everyone builds their own. Here’s Shopify’s...
Dealing with Overconfidence in Neural Networks: Bayesian Approach
I trained a multi-class classifier on images of cats, dogs and wild animals and passed an image of myself, it’s 98% confident I’m a dog. The problem isn’t that I passed an inappropriate image, because models in the real world are passed all sorts of garbage. It’s that the model is overconfident about an image far away from the training data. Instead we expect a more uniform distribution over the classes. The overconfidence makes it difficult to post-process model output (setting a threshold on predictions, etc.), which means it needs to be dealt with by the architecture. In this post I explore a Bayesian method for dealing with overconfident predictions for inputs far away from training data in neural networks. The method is called last layer Laplace approximation (LLLA)...
AI Face/Off — Fawkes vs. NicOrNot
Testing Facial Recognition Evasion on Nic Cage...
A very short history of some times we solved AI
This is obviously a very selective list, and I could easily find a handful more examples of when we solved the most important challenge for artificial intelligence and created software systems that were truly intelligent. These were all moments that changed everything, after which nothing would ever be the same. Because we made the machine do something that everyone agreed required true intelligence, the writing was on the wall for human cognitive superiority. We've been prognosticating the imminent arrival of our new AI overlords since at least the 50s. Beyond the sarcasm, what is it I want to say with this?...
5 Spark Best Practices For Data Science
It takes time to learn how to make spark do its magic but these 5 practices really pushed my project forward and sprinkled some spark magic on my code. To conclude, this is the post I was looking for (and didn’t find) when I started my project — I hope you found it just in time...
Training*
Quick Question For You: Do you want a Data Science job?
After helping hundred of readers like you get Data Science jobs, we've distilled all the real-world-tested advice into a self-directed course.
The course is broken down into three guides:
Data Science Getting Started Guide. This guide shows you how to figure out the knowledge gaps that MUST be closed in order for you to become a data scientist quickly and effectively (as well as the ones you can ignore)
Data Science Project Portfolio Guide. This guide teaches you how to start, structure, and develop your data science portfolio with the right goals and direction so that you are a hiring manager's dream candidate
Data Science Resume Guide. This guide shows how to make your resume promote your best parts, what to leave out, how to tailor it to each job you want, as well as how to make your cover letter so good it can't be ignored!
Click here to learn more ...
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Jobs
Data Scientist (Entry Level) - Saturn Cloud - Remote
Saturn Cloud helps companies perform data science at a new level of scale, with one-click solutions, to solve the world’s hardest problems. Our product is a SaaS platform which equips data science teams with high-leverage automation tools, eliminating hours of traditional, manual work. The platform is user-friendly, scalable and secure.
You will be an entry-level Data Scientist for Saturn Cloud, an exciting new venture founded by the creators of Anaconda, NumPy, and SciPy. The role features drafting the first generation of Saturn resource materials, tutorials, and technical content...
Want to post a job here? Email us for details >> team@datascienceweekly.org
Training & Resources
Writing a better code with pytorch and einops
These code fragments taken from official tutorials and popular repositories. Learn how to improve code and how einops can help you...
Efficient Serverless deployment of PyTorch models on Azure
A tutorial for serving models cost-effectively at scale using Azure Functions and ONNX Runtime...
Introducing Wildebeest, a Python File-Processing Framework
Wildebeest is ShopRunner's open-source file-processing framework that takes care of parallelizing over files, handling errors, skipping files that had already been processed, and recording results...
Books
Seven Databases in Seven Weeks:
A Guide to Modern Databases and the NoSQL Movement
"A book that tries to cover multiple database is a risky endeavor, a book that also provides hands on on each is even riskier but if implemented well leads to a great package. I loved the specific exercises the authors covered. A must read for all big data architects who don’t shy away from coding..."
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian