Data Science Weekly - Issue 150
Issue #150 Oct 6 2016
Editor Picks
A Dramatic Tour through Python’s Data Visualization Landscape (including ggplot and Altair)
I recently came upon Brian Granger and Jake VanderPlas’s Altair, a promising young visualization library...Thus, I’m using my discovery of Altair as an opportunity to step back — to investigate how Python’s statistical visualization options hang together. I hope this investigation proves helpful for you as well...
Deep-Fried Data
Today I'm here to talk to you about machine learning. I'd rather you hear about it from me than from your friends at school, or out on the street...Machine learning is like a deep-fat fryer. If you’ve never deep-fried something before, you think to yourself: "This is amazing! I bet this would work on anything!”...And in any deep frying situation, a good question to ask is: what is this stuff being fried in?...
Keynote Session: Dr. Edward Tufte - The Future of Data Analysis
Data analysis seeks to learn from experience. Better inferences require better thinking and better tools. Practical advice about how to make more credible conclusions based on data. What we can expect in the future, and what we should aspire to in the future...
A Message from this week's Sponsor:
Harness the business power of big data.
How far could you go with the right experience and education? Find out. At Capitol Technology University. Earn your PhD Management & Decision Sciences — in as little as three years — in convenient online classes. Banking, healthcare, energy and business all rely on insightful analysis. And business analytics spending will grow to $89.6 billion in 2018. This is a tremendous opportunity — and Capitol’s PhD program will prepare you for it. Learn more now!
Data Science Articles & Videos
The Simpsons by the Data
Analysis of 27 seasons of Simpsons data reveals the show’s most significant side characters, a pattern of patriarchy, declining TV ratings, and more...
Three Challenges for Artificial Intelligence in Medicine
Why is the world’s most advanced AI used for cat videos, but not to help us live longer and healthier lives? A brief history of AI in Medicine, and the factors that may help it succeed where it has failed before...
Analysis of Farmers Markets
A series of data visualizations on Farmers' Market data from data.gov...
Automatically Grading Multiple Choice Exams From Photos With Python
Over the past few months I’ve gotten quite the number of requests landing in my inbox to build a bubble sheet/Scantron-like test reader using computer vision and image processing techniques... So here is a bubble sheet multiple choice scanner and test grader using OMR, Python and OpenCV...
Perceptual Losses for Real-Time Style Transfer and Super-Resolution
fast-neural-style: The paper builds on A Neural Algorithm of Artistic Style by Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge by training feedforward neural networks that apply artistic styles to images. After training, our feedforward networks can stylize images hundreds of times faster than the optimization-based method presented by Gatys et al...
Multiple Narrative Disentanglement: Unraveling Infinite Jest
Many works (of both fiction and non-fiction) span multiple, intersecting narratives, each of which constitutes a story in its own right. In this work I introduce the task of multiple narrative disentanglement (MND), in which the aim is to tease these narratives apart by assigning passages from a text to the sub-narratives to which they belong. The motivating example I use is David Foster Wallace’s fictional text Infinite Jest...
Replication in Data Science - A Dance Between Data Science & Machine Learning
We use Iterative Supervised Clustering as a simple building block for exploring Pinterest's Content. But simplicity can unlock great power and with this building block we show the shocking result of how hard it is to replicated data science conclusions. This begs us to challenge the future for When is Data Science a House of Cards?...
#TrumpWon? trend vs. reality. A deep dive into the underlying data
Why is everyone so obsessed with this hashtag and the fact that it was in Twitter’s trending topics list the morning after the first presidential debate? Perhaps the competitive nature of a presidential debate — the fact that there’s supposed to be a “winner” — means that we’re reading into any available data point. Maybe due to the nature of this specific election cycle, where facts seem to have become subjective, as people in online echo-chambers consume what they want to believe...
Jobs
Data Scientist, Growth - Coursera - Mountain View, CA Coursera is scaling a global platform to provide universal access to the world’s best education, and we’re driven by the passion and mission to let people learn without limits. We use data to drive our products and our business, and to better serve our learners.
We’re looking for a talented, creative, and driven data scientist with a sharp eye for UX design, strong algorithmic and analytic skills, and an interest in expanding the reach and quality of online education by improving our discovery experiences. Our ideal candidate is an independent, analytically-minded individual with strong software engineering and statistical modeling skills, who shares our passion for education. In this role, you’ll be directly involved in the development, implementation, and evaluation of discovery products, including search and personalized recommendations...
Training & Resources
Hadoop architectural overview
In this post, we’ll explore each of the technologies that make up a typical Hadoop deployment, and see how they all fit together...
Creating A Realtime Analytics & Event Processing System For Big Data Using Amazon Kinesis
This tutorial is addressed to engineers, developers and architects who would like to build a realtime analytics and event processing system for large amounts of data collected from multiple sources : IoT, logs from servers, routers, distributed processing systems...
An Introduction to Machine Learning in Julia
In this post, we introduce a simple machine learning algorithm called K Nearest Neighbors, and demonstrate certain Julia features that allow for its easy and efficient implementation. We will demonstrate that the code we write is inherently generic, and show the use of the same code to run on GPUs via the ArrayFire package...
Books
The Second Machine Age:
Work, Progress, and Prosperity in a Time of Brilliant Technologies In The Second Machine Age MIT’s Erik Brynjolfsson and Andrew McAfee―two thinkers at the forefront of their field―reveal the forces driving the reinvention of our lives and our economy...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian