Data Science Weekly - Issue 380
Issue #380 Mar 04 2021
Editor Picks
Things your manager might not know
When people talk about “managing up”, sometimes it’s framed as a bad thing – massaging the ego of people in charge so that they treat you well...In my experience, managing up is usually a lot more practical...Here are the facts your manager might not know about you and your team that we’ll cover in this post: a) What’s slowing the team down, b) Exactly what individual people on the team are working on, c) Where the technical debt is, d) How to help you get better at your job, e) What your goals are, f) What issues they should be escalating, g) What extra work you’re doing, and h) How compensation/promotions work at the company...For each one, I’ll give specific ways you can help get them the information they need. All of these ways you can help them will also help you – it’s not just an altruistic endeavor :)...
The Connected Connectome
The most comprehensive wiring map to date of the fruit fly brain has transformed the field of neuroscience, identifying new cell types and reconfiguring circuit models. Are neuroscientists now ready to tackle the mouse brain?...
Watch the winners of this year’s ‘Dance Your Ph.D.’ contest
The Dance Your Ph.D. contest has been challenging scientists to explain their research through dance for 14 years now...Here is the complete list of winners...
A Message from this week's Sponsor:
Quick Question For You: Do you want a Data Science job?
After helping hundred of readers like you get Data Science jobs, we've distilled all the real-world-tested advice into a self-directed course.
The course is broken down into three guides:
Data Science Getting Started Guide. This guide shows you how to figure out the knowledge gaps that MUST be closed in order for you to become a data scientist quickly and effectively (as well as the ones you can ignore)
Data Science Project Portfolio Guide. This guide teaches you how to start, structure, and develop your data science portfolio with the right goals and direction so that you are a hiring manager's dream candidate
Data Science Resume Guide. This guide shows how to make your resume promote your best parts, what to leave out, how to tailor it to each job you want, as well as how to make your cover letter so good it can't be ignored!
Data Science Articles & Videos
10 Must Read ML Blog Posts
I have been doing NLP/ML research for the last 6 years. I have come across a lot of machine learning resources and papers. Today, I kept thinking about the machine learning / NLP / deep learning related blog posts (not papers) that have been transformational for me. In this blog post, I provide a short collection of a few high-impact blog posts that come to mind...
The Next Evolution of Data Catalogs: Data Discovery Platforms
In this blog post, I’ll explain how most data catalogs approach the data context problem, why their approach falls short, and a better path forward: the data discovery platform...
Robust Trees for Security
Tree models are widely used for security, such as detecting malicious autonomous system, social engineering, malware distribution, phishing emails, advertising resources for ad blocker, and online scams, etc. Despite their popularity, the robustness of tree models has not been thoroughly studied in the context of security applications. In this post, I will show how to train robust trees to detect Twitter spam. Our most exciting result is that we can increase the feature manipulation cost for adaptive attackers to evade the robust tree ensemble by 10.6X...
The Open Syllabus Project Visualizes the 1,000,000+ Books Most Frequently Assigned in College Courses
The Prince, The Canterbury Tales, The Communist Manifesto, The Souls of Black Folk, The Elements of Style: we’ve read all these, of course. Or at least we’ve read most of them (one or two for sure), if our ever-dimmer memories of high school or college are to be trusted. But we can rest assured that students are reading — or in any case, being assigned — these very same works today, thanks to the Open Syllabus project, which as of this writing has assembled a database of 7,292,573 different college course syllabi...its “Galaxy” now visualizes the 1,138,841 most frequently assigned texts in that database, presenting them in a Google Maps-like interface for your intellectual exploration...
How to represent part-whole hierarchies in a neural network
This paper does not describe a working system. Instead, it presents a single idea about representation which allows advances made by several different groups to be combined into an imaginary system called GLOM. The advances include transformers, neural fields, contrastive representation learning, distillation and capsules...The idea is simply to use islands of identical vectors to represent the nodes in the parse tree. If GLOM can be made to work, it should significantly improve the interpretability of the representations produced by transformer-like systems when applied to vision or language...
Introducing: dbt + Materialize
While dbt is a great tool for transforming batch data, it cannot currently transform streaming data in real time. (The dbt team explicitly warns users about this in a few places.) Here at Materialize, we want to help the world stop batching and start streaming. So we* built a dbt adapter that will allow you to transform your streaming data in real time using Materialize as your data warehouse...The rest of this post will explore why dbt works best with batch data and how using Materialize unlocks streaming transformations...
Generative Adversarial Transformers
We introduce the GANsformer, a novel and efficient type of transformer, and explore it for the task of visual generative modeling. The network employs a bipartite structure that enables long-range interactions across the image, while maintaining computation of linearly efficiency, that can readily scale to high-resolution synthesis. It iteratively propagates information from a set of latent variables to the evolving visual features and vice versa, to support the refinement of each in light of the other and encourage the emergence of compositional representations of objects and scenes...
Lyra: A New Very Low-Bitrate Codec for Speech Compression
Connecting to others online via voice and video calls is something that is increasingly a part of everyday life. The real-time communication frameworks, like WebRTC, that make this possible depend on efficient compression techniques, codecs, to encode (or decode) signals for transmission or storage. A vital part of media applications for decades, codecs allow bandwidth-hungry applications to efficiently transmit data, and have led to an expectation of high-quality communication anywhere at any time...As such, a continuing challenge in developing codecs, both for video and audio, is to provide increasing quality, using less data, and to minimize latency for real-time communication...we have created Lyra, a high-quality, very low-bitrate speech codec that makes voice communication available even on the slowest networks...
What are Truchet tiles?
In January, I was inspired by the amount of generative art being produced, in particular on my twitter feed. The hashtag #genuary2021 was full of examples of generative art, with each day having a different “rule” to inspire their participants...With these ideas in mind, two friends independently started conversations with me – they had discovered Truchet tiles...But what are Truchet tiles and how could I use them for my next data visualisation idea?...Truchet tiles are tiles which are not fully rotationally symmetric that are designed so that when placed in squarely in a grid they generate interconnecting patterns...
The Building Blocks of a Modern Data Platform
In this article, I’ll break down what a modern data platform means in practice today. This includes the three core characteristics, six fundamental building blocks, and latest data tools you should know...
Training*
Online Data Science Programs from Drexel University
Find your algorithm for success with an online data science degree from Drexel University. Gain essential skills in tool creation and development, data and text mining, trend identification, and data manipulation and summarization by using leading industry technology to apply to your career. Learn More.
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Jobs
Senior Revenue Data Scientist - Mozilla - Remote
Come join a company with open-source at its heart! Mozilla is wholly owned by a non-profit and strives to build products that keep the Internet open, accessible, and secure for everyone. You’ll be part of our Data Org where you’ll join a talented team of Data Scientists and Data Engineers. We have a mature data pipeline that processes terabytes of data per day.
We’re looking for a Senior Data Scientist to join our Revenue Data Science team. You’ll work with a cross-functional team to understand and strengthen Mozilla’s financial health. You’ll have a chance to collaborate with folks from across the company and have a visible impact on our success....
Want to post a job here? Email us for details >> team@datascienceweekly.org
Training & Resources
How to put together a regression model
My students have been having trouble figuring out how to put together a regression model - what to include, what to interact, which variables are left-hand/right hand. So I made a flowchart...
Bivariate dasymetric map in R
A disadvantage of choropleth maps is that they tend to distort the relationship between the true underlying geography and the represented variable...We will see how we can make a dasymetric map using raster data with a resolution of 100 m. This post will use data from census sections of the median income and the Gini index for Spain. We will make a dasymetric and bivariate map, representing both variables with two ranges of colours on the same map...
"MLOps for TinyML" by Daniel Situnayake [ Video ]
Talk given on Nov 2, 2020 for the internal Harvard offering of the Intro to TinyML course...Dan leads embedded machine learning engineering at Edge Impulse, a platform that allows developers to train machine learning models that run on tiny, low-power devices. He's co-author of the book TinyML: Machine Learning with TensorFlow Lite on Arduino and Ultra-Low-Power Microcontrollers, published by O'Reilly, and sits on the tinyML Foundation's community organizing committee. He has previously worked on the TensorFlow team at Google, as CEO of Tiny Farms Inc., and as a lecturer in Automatic Identification and Data Capture at Birmingham City University...
Books
Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian