Data Science Weekly - Issue 382
Issue #382 Mar 18 2021
Editor Picks
Why Computers Will Never Write Good Novels
You’ve been hoaxed...The hoax seems harmless enough. A few thousand AI researchers have claimed that computers can read and write literature. They’ve alleged that algorithms can unearth the secret formulas of fiction and film. That Bayesian software can map the plots of memoirs and comic books...The hoax, after all, is everywhere: college classrooms, public libraries, quiz games, IBM, Stanford, Oxford, Hollywood...Yet despite all this gaudy credentialing, the hoax is a complete cheat, a total scam, a fiction of the grossest kind. Computers can’t grasp the most lucid haiku. Nor can they pen the clumsiest fairytale. Computers cannot read or write literature at all. And they never, never will...I can prove it to you...
Wine & Math: A Model Pairing
Mention predictive modeling to the general public and you’re likely to conjure memes of complex mathematical equations swirling. Mention wine, and you get a much different reaction. One can be intimidating, the other inviting...In this piece, we’re going to try to close that gap. We’ll build a statistical model trying to predict a wine’s quality by its properties. So grab some liquid courage in your favorite aged grape variety and get ready for MATH...
Which color scale to use when visualizing data
When visualizing data, you’re almost always working with color – e.g., with different hues (red, yellow, blue) for categories or color gradients (light blue, medium blue, dark blue) for maps...If you use them to visualize data, hue palettes and gradients become “color scales.” That’s because they all “map” to some data: For example, every one of your hues stands for a certain category and every color in your gradient stands for a certain value (range)...This article gives you an overview of the different color scales...
A Message from this week's Sponsor:
Get exclusive content to fuel your breakthroughs at The Edge –
powered by Z by HP & Nvidia
Meet the demands of your workflows with articles, case studies, videos, podcasts, webinars and more, at the new Z by HP data science center. Hit the ground running with the latest research and industry trends, and–for an extra dose of motivation–check out our Ambassador section. There you’ll find experiences, favorite tools and their data science goals for the future that’ll help turn your data into transformative business results.
Check it out.
Data Science Articles & Videos
Measuring Diversity
If someone who is not a white man searches for “CEO pictures” and sees a page of white men, they may feel that only white men can be CEOs, further perpetuating lack of representation at companies’ executive levels...Using the careful quantification outlined in a recent paper, Diversity and Inclusion Metrics in Subset Selection, we can quantify biases and push these systems to return a wider range of results...The mathematics of all this is a little easier to follow with abstract shapes. Let’s take a look at some of them...
The hidden fight to stop illegal fishing from destroying our oceans
When trawlers turn off their transponders, they "go dark", allowing them to hide illicit activity such as illegal fishing and modern slavery. Now, a team of ocean experts is using satellite data to light them back up...
Data Science for Marketing Optimization - Case Studies from Airbnb, Lyft, DoorDash
In this article we'll look at several case studies of data science being used to optimize marketing efforts at companies like Lyft, Airbnb, Netflix, Doordash, Wolt, Rovio Entertainment...
We Failed to Set Up a Data Catalog 3 Times. Here’s Why.
We worked with a wide variety of data, everything from 600+ government data sources to unstructured data sources like satellite imagery. Our data grew faster than we expected, and we hadn’t really planned how to store or access it beforehand. We quickly realized we needed a central repository to help our team discover, understand, and build trust in all the data sets we were working with...We thought it would be easy enough to figure this out, but we couldn’t have been more wrong. Here’s the story of how it took 4 attempts and 5 years to finally succeed in implementing a successful data catalog for our team...
Growing 3D Artefacts and Functional Machines with Neural Cellular Automata
Neural Cellular Automata (NCAs) have been proven effective in simulating morphogenetic processes, the continuous construction of complex structures from very few starting cells..In this work, we propose an extension of NCAs from 2D to 3D, utilizing 3D convolutions in the proposed neural network architecture. Minecraft is selected as the environment for our automaton since it allows the generation of both static structures and moving machines. We show that despite their simplicity, NCAs are capable of growing complex entities such as castles, apartment blocks, and trees, some of which are composed of over 3,000 blocks. Additionally, when trained for regeneration, the system is able to regrow parts of simple functional machines, significantly expanding the capabilities of simulated morphogenetic systems...
Intruder Detection Using Raspberry Pi Pico and Edge Impulse [How-to Guide]
A system which detects intruders in dark with a low resolution thermal camera connected to a Raspberry Pi Pico using a Tensorflow Lite model..
Deep Learning for Land Cover Classification of Satellite Imagery Using Python
This article helps readers to better understand different Deep Learning methods that can be used for land cover classification of Sundarbans satellite data using Python...Remote sensing is the process of detecting and monitoring the physical characteristics of an area by measuring it is reflected and emitted radiation at a distance (typically from satellite or aircraft). Special cameras collect remotely sensed images, which help researchers “sense” things about the Earth...
Data Documentation Woes? Here’s a Framework
In 2016, I was at the helm of a data team that was rapidly scaling...Then...our oldest data team member, someone who’d been with us for two years, told me that he wanted to quit...That incident marked the start of the Assembly Line Project: an effort to make our data team as agile and resilient as possible. Over two years, we created internal tools and frameworks to help our team run better, and we also learned a lot about building a stronger data culture around the principles of self-organization and transparency...In this article, I’ll share the principles and framework we use to organize our own data team...democratize our data, and make documentation a part of our daily workflow...
Partial Differential Equations is All You Need for Generating Neural Architectures – A Theory for Physical Artificial Intelligence Systems
In this work, we generalize the reaction-diffusion equation in statistical physics, Schrödinger equation in quantum mechanics, Helmholtz equation in paraxial optics into the neural partial differential equations (NPDE), which can be considered as the fundamental equations in the field of artificial intelligence research...
Never a dill moment: Exploiting machine learning pickle files
Many machine learning (ML) models are Python pickle files under the hood, and it makes sense. The use of pickling conserves memory, enables start-and-stop model training, and makes trained models portable (and, thereby, shareable)...Here, we discuss the underhanded antics that can occur simply from loading an untrusted pickle file or ML model. In the process, we introduce a new tool, Fickling, that can help you reverse engineer, test, and even create malicious pickle files. If you are an ML practitioner, you’ll learn about the security risks inherent in standard ML practices. If you are a security engineer, you’ll learn about a new tool that can help you construct and forensically examine pickle files. Either way, by the end of this article, pickling will hopefully leave a sour taste in your mouth...
Training*
Online Data Science Programs from Drexel University
Find your algorithm for success with an online data science degree from Drexel University. Gain essential skills in tool creation and development, data and text mining, trend identification, and data manipulation and summarization by using leading industry technology to apply to your career. Learn more.
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Jobs
Data Scientist - HelloFresh - Chicago, IL or New York, NY
Embedded in the NYC Tech Hub, we are building a cross-functional team of data scientists, analysts and engineers with the mission to bring the modeling and analytical capabilities of our marketing organization to the next level.
As a Data Scientist, you will support the analytic needs of our Growth organization comprising Technology, Digital Product and Marketing. You will play a pivotal role in helping us continue to succeed as the leading global meal kit provider. This role will solve challenging problems using vast repositories of customer data to provide detailed and actionable insights; core responsibilities include the development and automation of Marketing BI tools, predictive modeling, professional-grade dashboarding and reporting for some of our most critical initiatives and enhancing and facilitating the information extraction process...
Want to post a job here? Email us for details >> team@datascienceweekly.org
Training & Resources
Cookiecutter Data Science
A logical, reasonably standardized, but flexible project structure for doing and sharing data science work...
Kedro-Airflow 0.4.0 — Orchestrating Kedro Pipelines with Airflow
Kedro is an open-source Python framework for creating reproducible, maintainable, and modular data science code. Its focus is on authoring code and not orchestrating, scheduling and monitoring pipeline runs. We emphasise infrastructure independence, and this is crucial for consultancies...where Kedro was born...Workflows in Airflow are modelled and organised as DAGs, making it a suitable engine to orchestrate and execute a pipeline authored with Kedro. To keep the workflow seamless, we are pleased to unveil the latest version of the Kedro-Airflow plugin, which simplifies deployment of a Kedro project on Airflow...
Applied Machine Learning (Cornell Tech CS 5787, Fall 2020) [ 80 videos ]
Lecture videos and materials from the Applied Machine Learning course at Cornell Tech, taught in Fall 2020...Starting from the very basics, we cover all of the most important ML algorithms and how to apply them in practice...One new idea we tried in this course was to make all the materials executable. The slides are also Jupyter notebooks with programmatically generated figures. Readers can tweak parameters and regenerate the figures themselves...Also, whenever we introduce an important mathematical formula, we implement it in numpy. This helps establish connections between the math and how to apply it in code...
Books
Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian