Topic: Re-Weighting the IMDB 250
Hey y'all. This came up in the chat, and I decided to fuck around with it and see what I could do:
So, the IMDB Top 250 seems to lean disproportionately towards recent films, so I decided to see if I could account for this somewhat. Disclaimer: It's a Saturday and I have been drinking, so I've not been super-rigorous about this, but hey, fuck it, what do you want from me?
So, here's the top 25, as of today. Here's the post-2000s:
Dark Knight (2008) at number 4.
Lord of The Rings (2003, 2001) at 9 and 11, respectively.
Inception (2010) at 14.
Lord of The Rings (2002) at 16.
Interstellar (2014! wtf?!?) at 20.
City of God (2002) at 22. (I do really like this one, though.)
Okay. If you take the 250 and do a histogram by years, you can see that there are a disproportionate number of post-90s films. Obviously, there's a lot of reasons for this, and one of them is that Gen-Y and Millenials are more internet savvy so are going to be voting for more recent movies. Since a movie's score is weighted by the number of votes it gets, it will tend to bias films with more votes (i.e. recent ones).
Disclaimer: So, there actually isn't a significant relationship (linear regression) between year and score, which is unexpected. However, this is complicated by the fact that all the scores are bound between 8.2 and 9.2, and that there are 94 years worth of movies (among other issues). Let's just continue as if this bias was detectable.
There IS a significant relationship between year and the frequency of appearances on the list, with a slope of about 5%.
So, year has a 5% contribution, to some extent. First attempt is to adjust the score of the film, by weighting it by year. Oldest film (1921) retains its whole score, newest films (2014) get only 95% of their score. I tried this linearly.
Slight improvement. Only 3 post-2000 films in the top 25, this time:
Return of The King dropped from 9 to 10.
Dark Knight from 4 to 13.
Fellowship from 11 to 19.
But, I think a linear weighting works badly, since it's also going to be having an effect (even though it's small, it will affect the ranking) on films made in the 60s/70s. So, we need a fairer weighting.
Here's the .csv of these top 250 if you wanna take a look:
http://www.filedropper.com/linearimdbtop250
Exponential weighting. Look, I'll be honest with y'all... dealing with exponential distributions when I'm sober and focused is bad enough as it is. It's trial-and-error at the best of times, and I don't really understand this "vector of quantiles" malarkey. I have been drinking. So I just fucked around with the numbers until the curve and the axes looked reasonable. It's not perfect, and I would've liked the penalty on post-2000s to be higher, and the slope on the pre-2000s to be shallower, but, whatever, fuck it.
Hmm. Your mileage may vary. I think it could do better with a different exponential weighting, but I can't be arsed. The Dark Knight and LOTR are still too goddamn high, but, whatever.
The important thing is that 12 Angry Men is closer to number 1.
Anyway, here's the .csv of the re-weighted top 250:
http://www.filedropper.com/exponentialimdbtop250
It's an interesting re-ordering, and arguably would cause less arguments than the current 250. Maybe one day I'll try with a better exponential curve.
Disclaimer: I'm a professional, but not at - oh shit, wait, I am a professional at this. When I'm sober. I'm not a professional right now though.