Our second project, Project Luther, was to scrape collect movie data from publicly available data on websites such as Box Office Mojo, IMDb, and Rotten Tomatoes. After forming a question, we scraped the data relevant to answer it.
Check out this project's GitHub repo!
Being a musician (and a huge fan of movie music), my question was whether or not the top 5 composers make a difference in the value of Domestic Total Gross. This YouTube video does a good job of illustrating the importance of music in movies:
So it's quite apparent that music is very important in a movie, but is it important that you have iconic and the correct music? Or is just any music fine?
Okay, maybe that was a little extreme, but maybe just having a composer writing music doesn't translate to the same gross as if you had one of the top 5. For a little bit of context, the average gross per movie for my dataset was around $82,650,000.
In comparison, the average gross for movies of the top 5 composers:
1. Hans Zimmer: 110 movies averaging $95,800,000
2. John Williams: 70 movies averaging $142,990,000
3. James Newton Howard: 117 movies averaging $71,860,000
4. Danny Elfman: 77 movies averaging $93,050,000
5. Alan Silvestri: 88 movies averaging $76,840,000
It seems that 3 of the top 5 composers' average movie gross are than the overall average of the movies I looked at. However, something to take into consideration was that perhaps you have a good composer because your budget was high and maybe that's why your movie was good, so you made more money! That means I definitely had to put budget into the prediction model to take it into account.
The Data Scraping
Step one was to scrape what data I could. Using beautiful soup, I looked for the URLs of all the Box Office Mojo top 100 pages from 1996 - 2016 (present) and looked through the individual movie pages to find the information I could use in my prediction model. Thankfully, Box Office Mojo URLs have a very systematic and interpretable pattern. Here is an excerpt of the scrape code to find the relevant individual movie Box Office Mojo page URLs:
#Get URLS of top 100 grossing movies per year from 1996 to 2016 = = for in : = '' + + '&p=.htm' = requests. = response.t = = soup. = re. for in : movie_urls. for in : if 'starwars' in : = movie. movie_urls. movie_urls.