Data Science Weekly - Issue 393
Issue #393 June 03 2021
Editor Picks
Generating Coherent Noise using Fourier Transforms
Recently, I came across a paper by Paul Bourke that outlined a method of producing 3D terrain (or simply Fractal Noise) using Fourier transforms (FT). It seemed like it shouldn’t work, but it did and I just had to find out why...This curiosity resulted in a weeks long quest with my friend Shubham Gupta to implement this algorithm and find why this works, if this works...Here, I will share what we found while researching this method and the reason why it works.
Interactive Gaussian Process Visualization
A Gaussian process can be thought of as an extension of the multivariate normal distribution to an infinite number of random variables covering each point on the input domain. The covariance between function values at any two points is given by the evaluation of the kernel of the Gaussian process. For an in-depth explanation, read this excellent distill.pub article and then come back to this interactive visualisation!...
Deepfake Maps Could Really Mess With Your Sense of the World
Researchers applied AI techniques to make portions of Seattle look more like Beijing. Such imagery could mislead governments or spread misinformation online...
A Message from this week's Sponsor:
Online Data Science Programs from Drexel University
Find your algorithm for success with an online data science degree from Drexel University. Gain essential skills in tool creation and development, data and text mining, trend identification, and data manipulation and summarization by using leading industry technology to apply to your career. Learn more.
Data Science Articles & Videos
Can You Build a Machine Learning Model to Monitor Another Model?
Can you train a machine learning model to predict your model’s mistakes? Nothing stops you from trying. But chances are, you are better off without it...We’ve seen this idea suggested more than once...It sounds reasonable on the surface. Machine learning models make mistakes. Let us take these mistakes and train another model to predict the missteps of the first one! Sort of a “trust detector,” based on learnings from how our model did in the past...
AndroidEnv: The Android Learning Environment
We introduce AndroidEnv, an open-source platform for Reinforcement Learning (RL) research built on top of the Android ecosystem. AndroidEnv allows RL agents to interact with a wide variety of apps and services commonly used by humans through a universal touchscreen interface. Since agents train on a realistic simulation of an Android device, they have the potential to be deployed on real devices. In this report, we give an overview of the environment, highlighting the significant features it provides for research, and we present an empirical evaluation of some popular reinforcement learning agents on a set of tasks built on this platform...
Machine Learning Deserves Better Than This
This is an excellent overview at Stat on the current problems with machine learning in healthcare. It’s a very hot topic indeed, and has been for some time. There has especially been a flood of manuscripts during the pandemic, applying ML/AI techniques to all sorts of coronavirus-related issues. Some of these have been pretty far-fetched, but others are working in areas that everyone agrees that machine learning can be truly useful, such as image analysis...
What I’ve learned about MLOps from speaking with 100+ ML practitioners
Over the past months, I’ve been banging my head to understand what stands behind the overly hyped statement “90/87/85% of the machine learning models never get to production” and what it has to do with MLOps...Without further ado, here is part of the key takeaways around the everyday MLOps challenges my team discovered on this fantastic journey...
Introducing Seasonal Contrast for Remote Sensing
Have you ever wanted to tackle global-scale environmental challenges using satellite imagery but found yourself constrained due to a lack of annotations despite the vast amount of publicly available data?..Here we introduce Seasonal Contrast (SeCo), a self-supervised method for pre-training visual representations from the seasonal changes that occur in different regions of the Earth...
IT Threat Detection with Similarity Search
This notebook shows how...to build an application for detecting rare events. Such application is common in cyber-security and fraud detection domains wherein only a tiny fraction of the events are malicious...Here we will build a network intrusion detector. Network intrusion detection systems monitor incoming and outgoing network traffic flow, raising alarms whenever a threat is detected. Here we use a deep-learning model and similarity search in detecting and classifying network intrusion traffic...
Do Wide and Deep Networks Learn the Same Things?
In “Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth”, we perform a systematic study of the similarity between wide and deep networks from the same architectural family through the lens of their hidden representations and final outputs. In very wide or very deep models, we find a characteristic block structure in their internal representations, and establish a connection between this phenomenon and model overparameterization...
Neural Networks Emulate Any Guitar Pedal For $120
It’s a well-established fact that a guitarist’s acumen can be accurately gauged by the size of their pedal board- the more stompboxes, the better the player...Jokes aside, the idea of replacing an entire pedal collection with a single box is nothing new...Just released by [GuitarML], the NeuralPi takes about $120 of hardware (including — you guessed it — a Raspberry Pi) and transforms it into the perfect pedal...The key here, of course, is neural networks...
PyTorch builds the future of AI and machine learning at Facebook
Facebook’s AI models perform trillions of inference operations every day for the billions of people that use our technologies. Meeting this growing workload demand means we have to continually evolve our AI frameworks. Which is why, today we’re announcing that we’re migrating all our AI systems to PyTorch...Today, over a year into the migration process, there are more than 1,700 PyTorch-based inference models in full production at Facebook, and 93 percent of our new training models — those responsible for identifying and analyzing content on Facebook — are on PyTorch...
MIDS W209 Information Visualization Slides
Slides source code for the W209 Information Visualization course of the Masters in Data Science at UC Berkeley. Designed by John Alexis Guerra Gomez and Andy Reagan...Apart from the slides included in this code, you can also find below links to the videos and Observable notebooks used for each module...
Training*
Quick Question For You: Do you want a Data Science job?
After helping hundred of readers like you get Data Science jobs, we've distilled all the real-world-tested advice into a self-directed course.
The course is broken down into three guides:
Data Science Getting Started Guide. This guide shows you how to figure out the knowledge gaps that MUST be closed in order for you to become a data scientist quickly and effectively (as well as the ones you can ignore)
Data Science Project Portfolio Guide. This guide teaches you how to start, structure, and develop your data science portfolio with the right goals and direction so that you are a hiring manager's dream candidate
Data Science Resume Guide. This guide shows how to make your resume promote your best parts, what to leave out, how to tailor it to each job you want, as well as how to make your cover letter so good it can't be ignored!
Click here to learn more ...
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Jobs
Senior Data Scientist - WarnerMedia - New York, NY
WarnerMedia is a leading media and entertainment company that creates and distributes premium and popular content from a diverse array of talented storytellers and journalists to global audiences through its consumer brands including: HBO, HBO Max, Warner Bros., TNT, TBS, truTV, CNN, DC Entertainment, New Line, Cartoon Network, Adult Swim, Turner Classic Movies and others.
Reporting to the Sr. Manager, Data Science this role will help to develop the predictive insights and prescriptive capabilities behind CNN’s emerging products, transforming first- and third- party data into quantitative findings, visualizations, and automation
Want to post a job here? Email us for details >> team@datascienceweekly.org
Training & Resources
Don’t start learning data science with neural networks
A brief overview of why you must not start studying data science with neural networks...I often meet students that start their journey towards data science with Keras, Tensorflow and, generally speaking, Deep Learning. They build tons of neural networks like crazy, but in the end they fail with their models because they don’t know machine learning enough nor they are able to apply the necessary pre-processing techniques needed for making neural networks work...Here’s why, if you start your career as a data scientist, you don’t need to start with Deep Learning...
Hugging Face Monthly Email
Get the latest news from Hugging Face in a monthly email: NLP papers, open source updates, new models and datasets, community highlights, useful tutorials and more!...
A Concrete Introduction to Probability (using Python)
In 1814, Pierre-Simon Laplace wrote: Probability theory is nothing but common sense reduced to calculation...[Probability] is thus simply a fraction whose numerator is the number of favorable cases and whose denominator is the number of all the cases possible ... when nothing leads us to expect that any one of these cases should occur more than any other...Laplace nailed it. To untangle a probability problem, all you have to do is define exactly what the cases are, and careful count the favorable and total cases. Let's be clear on our vocabulary words...
Books
Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian