Data Science Weekly - Issue 168
Issue #168 Feb 9 2017
Editor Picks
A Litany of Problems With p-values
In my opinion, null hypothesis testing and p-values have done significant harm to science. The purpose of this note is to catalog the many problems caused by p-values. As readers post new problems in their comments, more will be incorporated into the list, so this is a work in progress...
Can joining a social network prompt us to do more exercise?
Stanford computer scientists find that people who used a smartphone app to track exercise activity walked a little more each day once they joined the app’s optional social network...
Using Machine Learning to Predict Parking Difficulty
Much of driving is spent either stuck in traffic or looking for parking. With products like Google Maps and Waze, it is our long-standing goal to help people navigate the roads easily and efficiently. But until now, there wasn’t a tool to address the all-too-common parking woes...
A Message from this week's Sponsor:
Get hired as a data scientist
In data science, one size does not fit all. That's why Thinkful's customizable curriculum is so critical. Learn Python tools to collect & analyze data, statistics and probability to form the right framework, and machine learning to create models that predict the future. View the Flexible Data Science Bootcamp syllabus to learn more about our online, part-time bootcamp.
Data Science Articles & Videos
Mark Cuban on Why You Need to Study AI or You’ll be a Dinosaur in 3 Years
Mark Cuban opened the Upfront Summit in an epic interview by Jason Hirschhorn, founder of Media REDEF. They discussed many topics ranging from protection of the press to what Mark looks for in an entrepreneur to investing outside of Silicon Valley and of course Trump and sports. But perhaps to most insightful was their discussion about Machine Learning / AI...
Clustering Similar Stories Using LDA
With our scale of millions of articles and constant stream of documents, it’s impossible to generate roundups manually. So, we have developed a clustering algorithm that’s both fast and scalable, and in this blog post, I will explain how we create these roundups on Flipboard...
Intro to Data Science for Academics
Some good thoughts for people trying to get into data science...
Applying Bayes Theorem: Simulating the Monty Hall Problem with Python
In this post, I will examine the theoretical probability of each selection, and then I will use Python to test and prove the theory...
The Power of Big Data and Psychographics
In a 10 minute presentation at the 2016 Concordia Summit, Mr. Alexander Nix discusses the power of big data in global elections. Cambridge Analytica’s revolutionary approach to audience targeting, data modeling, and psychographic profiling has made them a leader in behavioral microtargeting for election processes around the world...
Build a super fast deep learning machine for under $1,000
Yes, you can run TensorFlow on a $39 Raspberry Pi, and yes, you can run TensorFlow on a GPU powered EC2 node for about $1 per hour. And yes, those options probably make more practical sense than building your own computer. But if you’re like me, you’re dying to build your own fast deep learning machine...
Color Quantization Using K-Means
In this article I'd like to talk about color quantization and how the k-means clustering algorithm can be used to perform it and how it performs compared to simpler methods..
Visualizing Time-Series Change
To evaluate the different methods for visualizing change, I chose to examine population data from the three major North American countries...
Jobs
Sr. Data Scientist - NASA Ames Research Center - Mountain View, CA INNOVATOR wanted: 💡 Artificial Intelligence (AI) Data Mining Development...
Training & Resources
On the intuition behind deep learning & GANs —
towards a fundamental understanding
A generative adversarial network (GAN) is composed of two separate networks - the generator and the discriminator. It poses the unsupervised learning problem as a game between the two. In this post we will see why GANs have so much potential, and frame GANs as a boxing match between two opponents...
Introduction to ggraph: Layouts
I will soon submit ggraph to CRAN - I swear! But in the meantime I’ve decided to build up anticipation for the great event by publishing a range of blog posts describing the central parts of ggraph: Layouts, Nodes, Edges, and Connections. All of these posts will be included with ggraph as vignettes — potentially in slightly modified form. To kick off everything we’ll start with the first thing you’ll have to think about when plotting a graph structure...
Using functional programming in Python like a boss:
Generators, Iterators and Decorators
I'm a simple man. I see functions, I compose them...
Books
Bayes Theorem: A Visual Introduction For Beginners "This book takes what can be a daunting and complex subject and breaks it down with a series of easy to follow examples which buildup to deliver a great overall explanation of how to use Bayes Theorem for basic analysis and even off-the-cuff critical thinking"...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian