Data Science Weekly - Issue 132
Issue #132 June 2 2016
Editor Picks
The First Visual Search Engine for Scientific Diagrams
A machine-vision algorithm has learned to analyze and categorize scientific figures...
The Making Of A Machine Learning Cheatsheet: Emoji Edition
I've mentioned this before, but I really love emoji. I spend so much of my time communicating with friends and family on chat, emoji bring necessary animation to my words that might otherwise look flat on the screen...Another thing I love is data science. The more I learn about machine learning algorithms, the more challenging it is to keep these subjects organized in my brain to recall at a later time. So, I decided to marry these two loves in as productive a fashion as possible...
How The Toronto Symphony Orchestra Uses Graphic Design To Guide Its Audiences Though Its Music
The Toronto Symphony Orchestra’s ‘listening guides’ make use of symbols and morse code-like notation to aid the experience of a live performance. We talked to their creator, Hannah Chan-Hartley, about how she is helping the TSO to visualise its repertoire...
A Message from this week's Sponsor:
“The Science of Data-Driven Storytelling”
DataScience Inc. and the National Science Foundation’s West Big Data Innovation Hub have brought together leaders in academia, the non-profit sector, government, data science and publishing to discuss best practices for creating impactful data-driven stories. Click here to register for the live-streamed workshop, “The Science of Data-Driven Storytelling”.
Data Science Articles & Videos
State-of-the-Art AI: Building Tomorrow’s Intelligent Systems
Peter Norvig, Director of Research for Google, on developing state-of-the-art AI solutions for building tomorrow's intelligent systems...
Visualizing City Similarity
This blog post explains an alternative way to figure out how similar cities are. After you read it, you will realize why I think Madison and Reykjavik are very similar cities...
Accurate prediction of single-cell DNA methylation states using deep learning
Recent technological advances have enabled assaying DNA methylation in single cells. Current protocols are limited by incomplete CpG coverage and hence methods to predict missing methylation states are critical to enable genome-wide analyses. We here report DeepCpG, a computational approach based on deep neural networks to predict DNA methylation states from DNA sequence and incomplete methylation profiles in single cells...
How to build up a data team (everything I ever learned about recruiting)
During my time at Spotify, I’ve reviewed thousands of resumes and interviewed hundreds of people. Lots of them were rejected but lots of them also got offers. Finally, I’ve also had my share of offers rejected by the candidate...that being said, here are some things I learned from recruiting...
Finding Similar Sounding Names – Some Basics
Since my wife and I have a baby on the way, we've spent a lot of time thinking about names lately...After playing around with all of those baby naming tools, I recently took a stab myself and built a website that lets you find names that sound like ones you already like...For today's post, I'll simply be highlighting some of the algorithms I used to find words that sound similar, and how to implement them in SQL...
Deep Reinforcement Learning: Pong from Pixels
RL is hot! You may have noticed that computers can now automatically learn to play ATARI games (from raw game pixels!), they are beating world champions at Go, simulated quadrupeds are learning to run and leap, and robots are learning how to perform complex manipulation tasks that defy explicit programming. It turns out that all of these advances fall under the umbrella of RL research...
A Car’s Computer Can ‘fingerprint’ You In Minutes Based On How You Drive
The Way You drive is surprisingly unique. And in an era when automobiles have become data-harvesting, multi-ton mobile computers, the data collected by your car—or one you rent or borrow—can probably identify you based on that driving style after as little as a few minutes behind the wheel...
What is software engineering for data science?
One question that you’ll find yourself asking, is at what point do you need to systematize common tasks and procedures across projects versus recreating code or writing new code from scratch on every new project? It depends on a variety of factors and answering this question may require communication within your team, and with people outside of your team...
Jobs
Data Scientist - MediaMath - New York MediaMath is a technology platform that brings together all forms of digital media, massive amounts of data, and sophisticated algorithms to power smarter marketing for the world’s leading advertisers. We’ve enjoyed massive growth since our founding in 2007, and we’re now a global company with offices in 20 locations – but this revolution has just begun! We are currently looking for a Data Scientist to support the ongoing development of MediaMath’s proprietary algorithms and analytics...
Training & Resources
Non-Metric Space Library (NMSLIB)
Non-Metric Space Library (NMSLIB): A similarity search library and a toolkit for evaluation of k-NN methods for generic non-metric spaces...
How to Build a Grouped Bar Chart in D3
In this tutorial you will use the CSV data from the D3js.org website Grouped Bar Chart Example to see how a full D3 Grouped Bar Chart Example data visualization is built...
Introducing our Hybrid lda2vec Algorithm
The goal of lda2vec is to make volumes of text useful to humans (not machines!) while still keeping the model simple to modify. It learns the powerful word representations in word2vec while jointly constructing human-interpretable LDA document representations...
Books
Introducing Data Science:
Big Data, Machine Learning, and more, using Python tools Introducing Data Science explains vital data science concepts and teaches you how to accomplish the fundamental tasks that occupy data scientists. You’ll explore data visualization, graph databases, the use of NoSQL, and the data science process. You’ll use the Python language and common Python libraries as you experience firsthand the challenges of dealing with data at scale. Discover how Python allows you to gain insights from data sets so big that they need to be stored on multiple machines, or from data moving so quickly that no single machine can handle it. This book gives you hands-on experience with the most popular Python data science libraries, Scikit-learn and StatsModels...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian