[in case you missed it] Data Science Weekly - Issue 386
Issue #386 Apr 15 2021
Editor Picks
Chartability - a11y accessibility guide
Chartability is a methodology for ensuring that data visualizations, systems, and interfaces are accessible. Chartability is organized into principles with testable criteria and focused on creating an outcome that is an inclusive data experience for people with disabilities... Chartability is organized into 7 principles, 4 common to the accessibility space: Perceivable, Operable, Understandable, and Robust (POUR) plus 3 more that extend from Robust: Compromising, Assistive, Flexible (CAF)...
Smart Interfaces for Human-Centered AI [Video]
In HAI's weekly seminar, Stanford Professor of Computer Science James Landay, a Stanford HAI associate director, explains three ways we can address grand challenges in the fields of health and education by building systems that balance innovative user interfaces with intelligent systems...
SkinDeep
I planned this project after watching Justin Bieber's "Anyone" Music Video, He had his tattoo covered up with the help of artists airbrushing on him for hours. The results were amazing in the music video. Producing that sought of video output can be difficult, so I opted for Images. Can deep learning do a decent job or can it even match photoshop? This was the starting point of this project...
A Message from this week's Sponsor:
Get exclusive content to fuel your breakthroughs at The Edge –
powered by Z by HP & Nvidia
Meet the demands of your workflows with articles, case studies, videos, podcasts, webinars and more, at the new Z by HP data science center. Hit the ground running with the latest research and industry trends, and–for an extra dose of motivation–check out our Ambassador section. There you’ll find experiences, favorite tools and their data science goals for the future that’ll help turn your data into transformative business results.
Check it out.
Data Science Articles & Videos
NMF — A visual explainer and Python Implementation
Gain an intuition for the unsupervised learning algorithm that allows data scientists to extract topics from texts, photos, and more, and build those handy recommendation systems. NMF explanation is followed by a Python Implementation on a toy example of topic modelling on Presidential Inauguration Speeches...
Representation Learning for Networks in Biology and Medicine: Advancements, Challenges, and Opportunities
In this review, we put forward an observation that long-standing principles of network biology and medicine -- while often unspoken in machine learning research -- can provide the conceptual grounding for representation learning, explain its current successes and limitations, and inform future advances. We synthesize a spectrum of algorithmic approaches that, at their core, leverage topological features to embed networks into compact vector spaces. We also provide a taxonomy of biomedical areas that are likely to benefit most from algorithmic innovation. Representation learning techniques are becoming essential for identifying causal variants underlying complex traits, disentangling behaviors of single cells and their impact on health, and diagnosing and treating diseases with safe and effective medicines...
DexYCB: A Benchmark for Capturing Hand Grasping of Objects
We introduce DexYCB, a new dataset for capturing hand grasping of objects. We first compare DexYCB with a related one through cross-dataset evaluation. We then present a thorough benchmark of state-of-the-art approaches on three relevant tasks: 2D object and keypoint detection, 6D object pose estimation, and 3D hand pose estimation. Finally, we evaluate a new robotics-relevant task: generating safe robot grasps in human-to-robot object handover...
The Data Science Maker: Live plotting data with Matplotlib and Raspberry Pi
This is my first post in a series of posts titled “The Data Science Maker” where I focus on the integration of data science software skills with consumer hardware for creative applications. The series is meant for those new or newly interested in electronics hardware, programming, and data science. Conversations in the comments are encouraged...
The story behind a baseline
Before you start building a machine learning model, you need a baseline...I find it helpful to think about 3 different levels and tackle them in order...Here is how I do this...
DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort
We introduce DatasetGAN: an automatic procedure to generate massive datasets of high-quality semantically segmented images requiring minimal human effort. Current deep networks are extremely data-hungry, benefiting from training on large-scale datasets, which are time consuming to annotate. Our method relies on the power of recent GANs to generate realistic images. We show how the GAN latent code can be decoded to produce a semantic segmentation of the image. Training the decoder only needs a few labeled examples to generalize to the rest of the latent space, resulting in an infinite annotated dataset generator!...
Data Smoothing in Data Science Visualization (The Goldilocks Trio)
A gentle journey into LOWESS and B-spline with reasons why...
Should Graph Neural Networks Use Features, Edges, Or Both?
Graph Neural Networks (GNNs) are the first choice for learning algorithms on graph data. GNNs promise to integrate (i) node features as well as (ii) edge information in an end-to-end learning algorithm. How does this promise work out practically? In this paper, we study to what extend GNNs are necessary to solve prominent graph classification problems. We find that for graph classification, a GNN is not more than the sum of its parts. We also find that, unlike features, predictions with an edge-only model do not always transfer to GNNs...
Distrax
Distrax is a lightweight library of probability distributions and bijectors. It acts as a JAX-native reimplementation of a subset of TensorFlow Probability (TFP), with some new features and emphasis on extensibility...
Self-Supervised Voice Emotion Recognition Using Transfer Learning
Have you noticed the unsettling feeling of hearing someone say something utterly terrible in a joyful tone of voice?...My goal for this project was to build a self-supervised binary emotion classifier from speech audio. Emotions are complex multidimensional concepts, but in this project, I have built a model that given an audio clip predicts whether the emotion of the voice is positive or negative...
Infrastructure / Tools *
What is Data Observability?
Investing in data observability is becoming increasingly important as companies collect more and more (often third-party) data. Introducing new data sources and expanding access to new data consumers also leads to more complex pipelines—which increases the opportunities for missing, stale, or duplicate data to affect your business. Barr Moses, CEO and co-founder of Monte Carlo, explains how data engineers can leverage best practices from DevOps and engineering to fix data quality at scale.
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Jobs
Tecton AI - SF / NYC
Tecton is building an enterprise Feature Store that is transforming the way companies solve real-world problems with machine learning at scale. Our founding team created Uber's Michelangelo ML Platform, which has become the blueprint for modern ML platforms in large organizations. We recently received Series B funding from Sequoia Capital and Andreessen Horowitz, have paying enterprise customers, and have growing engineering teams in SF and NYC. The team has years of experience building and operating business-critical machine learning systems at scale at places like Uber, Google, Facebook, Airbnb, Twitter, Quora, and AdRoll...
Software Engineer, Machine Learning
Software Engineer, Data Infrastructure
Software Engineer, Frontend
Want to post a job here? Email us for details >> team@datascienceweekly.org
Training & Resources
A Complete Anomaly Detection Algorithm From Scratch in Python: Step by Step Guide
In this article, I will explain the process of developing an anomaly detection algorithm from scratch in Python...
Stanford's CS224W: Machine Learning with Graphs
Complex data can be represented as a graph of relationships between objects. Such networks are a fundamental tool for modeling social, technological, and biological systems. This course focuses on the computational, algorithmic, and modeling challenges specific to the analysis of massive graphs. By means of studying the underlying graph structure and its features, students are introduced to machine learning techniques and data mining tools apt to reveal insights on a variety of networks...
Autoencoders: things they can do, and a pretty cool example [Twitter thread]
A lot in machine learning is pretty dry and boring, but understanding how autoencoders work feels different...Autoencoders are lossy data compression algorithms built using neural networks...A network encodes (compresses) the original input into an intermediate representation, and another network reverses the process to get the same input back...
Books
Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian