Data Science Weekly - Issue 61
Issue #61 Jan 22 2015
Editor Picks
The Unreasonable Effectivness Of Deep Learning
Talk by Yann LeCun, Head of Facebook AI Research...
Apple, Apps and Algorithmic Glitches
A data analysis of iTunes’ top chart algorithm...
How to Choose Between Learning Python or R First
If you’re interested in a career in data, and you’re familiar with the set of skills you’ll need to master, you know that Python and R are two of the most popular languages for data analysis. If you’re not exactly sure which to start learning first, you’re reading the right article...
Data Science Articles & Videos
Programming a Twitter bot – and the rescue from procrastination
It is fascinating to see how programmers use their creativity to repurpose Twitter's range and popularity. If you are familiar with R, such projects are well within your reach. In this post, I give a little demonstration of how to program your own Twitter bot using R...
The Difference Between Junior, Mid-Level, And Senior Data Scientist Jobs
When looking for a data science job, you will have to chose what job seniority to apply to. There are junior data science jobs, there are mid-level data science jobs, and there are senior data science jobs. An email subscriber recently asked for how they should think about the different levels...
Mining a VC
Topic analysis of Fred Wilson's blog (one of the most popular NY VCs)...
Visualizing Representations: Deep Learning and Human Beings
The combination of neural networks and dimensionality reduction turns out to be a very interesting tool for visualizing high-dimensional data – a much more powerful tool than dimensionality reduction on its own. As we dig into this, we’ll observe what I believe to be an important connection between neural networks, visualization, and user interface....
How to understand the drawbacks of K-means
Great answer on the hidden assumptions of k-means, and when it fails...
Random Forests for the Social Sciences
Machine learning techniques gain in popularity in many disciplines and increased computational power allows for easy implementation of such algorithms. However, they are still widely considered as "black box" models that are not suited for substantive research. We present one such method, random forests, with emphasis on practical application for exploratory analysis and substantive interpretation...
Baidu built a supercomputer for deep learning
Chinese search engine company Baidu says it has built the world’s most-accurate computer vision system, dubbed Deep Image, which runs on a supercomputer optimized for deep learning algorithms...
The Cathedral of Computation
We’re not living in an algorithmic culture so much as a computational theocracy....
Most Read Data Science Articles of 2014
Our most read newsletter articles of 2014!...
Jobs
Data Scientist, Reliability - Tesla Motors - Fremont, CA Tesla Motors uses proprietary technology, world-class design, and state-of-the-art manufacturing processes to create a new generation of highway capable electric vehicles. We utilize an innovative distribution model based on Company-owned sales and service centers. This approach allows us to maintain the highest levels of customer experience and benefit from short customer feedback loops to ensure our customer needs are fulfilled. The reliability engineering team is looking for a Data Scientist to join its team and exploit the benefits of machine learning and big data to improve reliability and delight our customers with exceptional vehicle quality...
Training & Resources
FAIR open sources deep-learning modules for Torch
Progress in science and technology accelerates when scientists share not just their results, but also their tools and methods. This is one of the reasons why Facebook AI Research (FAIR) is committed to open science and to open sourcing its tools.. Today, we're open sourcing optimized deep-learning modules for Torch. These modules are significantly faster than the default ones in Torch and have accelerated our research projects by allowing us to train larger neural nets in less time...
Random Forests and Boosting in MLlib
Spark 1.2 introduces Random Forests and Gradient-Boosted Trees (GBTs) into MLlib. Suitable for both classification and regression, they are among the most successful and widely deployed machine learning methods...
Deep Learning vs. Neural Network Learning
Whiteboard Walkthrough...
Books
Mastering 'Metrics: The Path from Cause to Effect Recent release: connects the dots between mathematical formulas, statistical methods, and real-world use cases ...
"Clear, interesting and an enjoyable read. There is sufficient mathematical formulaic representations without overwhelming those who are new to the literature..."
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Enjoyed the newsletter? Please forward it along to friends and colleagues - we'd love to have them onboard! - All the best, Hannah & Sebastian