Data Science Weekly - Issue 34
Issue #34 July 17 2014
Editor Picks
Algorithm identifies rare genetic disorders from family pics Oxford University researchers have developed a computer program that can diagnose rare genetic disorders in children simply by analysing regular photographs...
Deep Learning: A Year in Review Major events and commentary...
Why Probabilistic Programming Matters Last week, DARPA announced a new program to fund research in probabilistic programming languages. While the accompanying news stories gave a certain angle on why this was important, this is a new research area and it's still pretty mysterious to most people who care about machine intelligence. So: what is probabilistic programming, and why does it matter?...
Data Science Articles & Videos
After 9 yrs of research, Numenta has apps that mimic how the brain works
Jeff Hawkins and Donna Dubinsky started Numenta 9 years ago to create software that was modeled after the way the human brain processes information. It has taken longer than expected, but the Redwood City,-based startup recently held an open house to show how much progress it has made...
Understanding Convolutions
In a previous post, we built up an understanding of convolutional neural networks, without referring to any significant mathematics. To go further, however, we need to understand convolutions...
Microsoft Challenges Google’s Artificial Brain With ‘Project Adam’
Microsoft’s research arm says it has achieved new records with a deep learning system it calls Adam...
Stastical Advice for A/B Testing
A/B testing is awesome. It's fun, it's lucrative, and it's one of the most visible and impactful things that you can do as a data scientist / statistician / anyone-interested-in-optimization at a company. Unfortunately, good statistical methods for A/B testing are more complicated then they are sometimes thought to be...
Data for Good: Hacker News for showing off world-changing Data Science
Data scientists already have DataTau, a specialized version of Hacker News, that popular online water cooler for programmers. But some socially minded data scientists thought DataTau wasn’t great enough, and now they have their own site. Data for Good...
What is deep learning, and why should you care?
I’ve never encountered such a big improvement from a technique [Deep Learning] that was largely unheard of just a couple of years before, so I became obsessed with understanding more. To be able to use it commercially across hundreds of millions of photos, I built my own specialized library to efficiently run prediction on clusters of low-end machines and embedded devices, and I also spent months learning the dark arts of training neural networks. Now I’m keen to share some of what I’ve found...
Complementary Approaches to Forecasting Political Events
Advances in technology and the popularity of individuals like Nate Silver have given rise to the exciting idea that political scientists can predict the future using statistical models. Despite the recent attention forecasting has received, it is still difficult to do well, especially for rare political events like the onset of mass atrocities. In order to address this challenge, the Early Warning Project has developed a system that combines statistical forecasting with crowd-sourced forecasts...
About Feature Scaling and Normalization ...
and the effect of standardization for machine learning algorithms
I received a couple of questions in response to my previous article (Entry point: Data) where people asked me why I used Z-score standardization as feature scaling method prior to the PCA. I added additional information to the original article, however, I thought that it might be worthwhile to write a few more lines about this important topic in a separate article...
Music Recommendations with 300M Data Points and one SQL Query
While toying with the public BigQuery datasets, impatiently waiting for Google Cloud Dataflow to be released, I’ve noticed the Wikipedia Revision History one, which contains a list of 314M Wikipedia edits, up to 2010. In the spirit of Amazon’s “people who bought this”, I’ve decided to run a small experiment about music recommendations based on Wikipedia edits. The results are not perfect, but provide some insights that could be used to bootstrap a recommendation platform...
Jobs
Data Scientist - Shutterstock - New York, NY As a Data Scientist, you will be joining the team responsible for pushing technology boundaries in areas such as language translation, image recognition, natural language processing, and search ranking. Your work will directly empower the Shutterstock customer experience seen by millions of customers daily, and will enable new and unique customer features that drive Shutterstock's best in-class image and video search engine...
Training & Resources
Awesome-machine-learning
A curated list of awesome Machine Learning frameworks, libraries and software...
CMU Machine Learning Summer School
All CMU Machine Learning Summer School lecture videos online now...
8 great data blogs to follow
Below I've listed my favourite data analysis, data science, or otherwise technical blogs that I've learned a great deal from...
Books
The Lady Tasting Tea:
How Statistics Revolutionized Science in the Twentieth Century An insightful, revealing history of how mathematics transformed our world...
"I have taken courses in statistics, taught it many times and solved several statistical problems that have appeared in journals. But until I read this book, I never really thought about it in so deep and philosophical a manner..."
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Did you enjoy the newsletter? Do you have friends/colleagues who might like it too? If so, please forward it along - we would love to have them onboard :)