ReproduceIt

April 30th, 2015

I have been wanting to restart my blog for a while now. In 2013 I wrote around 20 posts but in 2014 I only wrote 3 times and this is my first post in 2015 and its not even a real one. Last one was more than 6 months ago. Time is one of the reasons but sometimes I have a couple of hours to kill that I would like to use to blog but ideas don’t come to me easily sometimes.

Last weekend I attended PyData Dallas 2015 and in some of the talks about data journalism and open data I got a simple but I believe effective idea for getting me writing new posts.

The idea is to take articles from the internet that do some kind of data analysis and reproduce the results. In most of the cases the data and analysis are not published but even if the code is available, like for some articles in fivethirtyeight.com, thats not a reason not to trying to reproduce the code and learn something in the process.

All the code and data will be publicly available trying to make the results the more reproducible as possible.

I am not that interested in stealing the conclusions the article might have but in having a reproducible analysis that is available.

While reproducing the results will be objective #1 I believe these simple projects are the perfect opportunity to try a new library, a new statistical technique or even a new language that you have been wanting to try but cannot really use at your real job or didn’t have a particular idea to try it.

The idea is not be restricted to internet articles it could also be applied to some academic paper or any kind of data project. Personally I’ll start with simple articles that I might have read over the week and that can be reproduced in a couple of afternoons so I can have one every week. Even though I’ll probably fail after the first one.