ReproduceIt: FiveThirtyEight - The Three Types Of Adam Sandler Movies

ReproduceIt is a series of articles that reproduce the results from data analysis articles focusing on having open data and open code. All the code and data is available on github: reproduceit-538-adam-sandler-movies. This post contains a more verbose version of the content that will probably get outdated while the github version could be updated including fixes.

I am a fan of FiveThirtyEight and how they do most of their articles based on data analysis I am also a fan of how they open source a lot of their code and data on github. The ReproduceIt series of articles is highly based on them.

In this first article of ReproduceIt I am going to try to reproduce the analysis Walt Hickey did for the article "The Three Types Of Adam Sandler Movies". This particular article is a simple data analysis on Adam Sandler movies and they didn't provide any code or data for it so it think is a nice opportunity to start this series of posts.

The other objective of these posts is to learn something new, in this case I did my first ever Bokeh plot to make the interactive plots. This makes it easier to explore the movie data that unfortunately is not possible in the original article static image.

Let's start by printing the version of the language (python in this case) and the libraries that are used on this analysis, an environment.yml is available in the github repo so you can easily create a conda environment from it.

In [1]:
import sys
sys.version_info
Out[1]:
sys.version_info(major=3, minor=4, micro=3, releaselevel='final', serial=0)
In [2]:
import requests
requests.__version__
Out[2]:
'2.6.2'
In [3]:
import bs4
from bs4 import BeautifulSoup
bs4.__version__
Out[3]:
'4.3.2'
In [4]:
import numpy as np
np.__version__
Out[4]:
'1.9.2'
In [5]:
import pandas as pd
pd.__version__
Out[5]:
'0.16.0'
In [6]:
from sklearn.cluster import KMeans

import sklearn
sklearn.__version__
Out[6]:
'0.16.1'
In [7]:
import bokeh.plotting as plt
from bokeh.models import HoverTool
plt.output_notebook()

import bokeh
bokeh.__version__