Forum for discussion about the Netflix Prize and dataset.
You are not logged in.
Pages: 1
hi all, the NYT article is well written & Im thinking, "finally!!" .. a nice, thorough mainstream article on the contest that captures some of the zeitgeist & sizzle...
The Screens Issue
If You Liked This, You’re Sure to Love That
http://www.nytimes.com/2008/11/23/magaz … lix-t.html
however, the article makes a very subtle/glaring (in my view) "non sequitur" at a certain point, which I suspect might be the way to a major breakthru.
clive thompson remarks how one contestant has a major problem with the movie "napoleon dynamite", that it is 15% of the gap between his reaching the $1M prize. (not exactly sure how that is computed-- that alone is a tricky computation).
but, thompson *assumes* that other contestants have the same problem with the same movie.
but, at this moment, it seems likely to me that this jumps the gun somewhat and is actually an intriguing open question.
let me spell this out.
to what degree do different teams have problems with the same movies? ie is there something intrinsically difficult about certain movies as is the major theme of the article, or is this actually an assumption? I dont see the author giving much evidence for his idea about "hard" movies except that various teams agree that some movies are hard. but that is a far different assertion than that the *same* movies are hard. are we sure the teams would agree on what movies are hard?
this is easy to answer as long as teams cooperate in a way that probably would not jeopardize any individual team. I propose the following.
let teams just post on their own web sites the RMSE listed BY MOVIE. in other words, the RMSE computed only over the ratings for individual movies. (note this could be over the quiz and/or test data. I am aware that there are very few predictions for some movies.)
the size of the data is only 17770 point of data.
arguably, this gives away no sensitive information. (Im sure others will argue differently, but as long as some agree to the idea, there can be some exchange...)
moreover, seems like this would answer some very big questions about how the algorithms are working relative to each other that could help all the teams with their own algorithms but also, in particular, reveal fruitful collaborations, which seem to have been powerful and/or critical, maybe even indispensable, in increasing the top scores to date.
some good algorithms might also be able to be devised that could suggest combinations of contestant algorithms and the potential score resulting from them.
what I propose is just one idea; surely there are other useful statistics that teams might agree to share/standardize that would not jeopardize individual teams or algorithms.
sounds like a promising approach...
any takers?
Last edited by vzn (2008-11-22 15:44:44)
Offline
It looks like I'm seeing the same patterns discussed in that NY Times article. Filtering out movies with <1000 predictions in probe.txt (to get a decent average), then sorting by per-movie RMSE, the top movies I get are:
RMSE SqErr NumPts MSE MovieID Title ================================================================================= 1.2589 8379.240 5287 1.5849 3151 Napoleon Dynamite 1.2385 2696.370 1758 1.5338 14890 Team America: World Police 1.2291 2122.580 1405 1.5107 11022 Fahrenheit 9/11 1.1565 2312.660 1729 1.3376 4266 The Passion of the Christ 1.1493 1746.190 1322 1.3209 5695 Bad Santa 1.1432 3259.140 2494 1.3068 7635 Anchorman: The Legend of Ron Burgundy 1.1400 1782.990 1372 1.2996 14274 I Heart Huckabees 1.1277 1417.880 1115 1.2716 14454 Kill Bill: Vol. 1 1.1061 1622.160 1326 1.2234 12232 Lost in Translation 1.0725 2483.450 2159 1.1503 361 The Phantom of the Opera: Special Edition 1.0690 1350.620 1182 1.1427 897 Bride and Prejudice 1.0569 2828.510 2532 1.1171 16640 Closer 1.0540 5864.100 5279 1.1108 5991 Sin City 1.0514 6362.060 5755 1.1055 3282 Sideways 1.0468 3146.000 2871 1.0958 3333 The Village 1.0402 3781.630 3495 1.0820 1719 The Life Aquatic with Steve Zissou
That NY Times article mentions that Bertoni's list of 25-most-difficult-to-predict movies included: I Heart Huckabees, Lost in Translation, Farenheit 9/11, The Life Aquatic, Kill Bill, and Sideways. All those movies are all in the list above. So I think I'm getting the same patterns of results that are discussed in the article, despite the fact that my overall RMSE lags the leaders (0.8848, alas...).
Last edited by chef-ele (2008-11-23 14:44:55)
Offline
This one cracks me up: http://www.netflix.com/Movie/Twister/60 … 257360_1_0
Particularly if you read the reviews and look at the total number of ratings.
Offline
It's also interesting, for me anyway, to sort by the percentage of squared error that each movie contributes to the overall RMSE score. For instance, a movie like "Batman Begins" has a movie-specific RMSE of 0.7704, which certainly looks great, but because there are 12056 ratings for Batman Begins, it contributes 7156 to the overall squared error of 1131783 (or 0.63%).
Below are my top 15 using this criteria. There are some clear overlaps (Napoleon Dynamite, Sideways, etc). But I think it's also interesting to note that small improvements in predicting What Women Want, Batman Begins, Crash, etc - movies that all have decent movie-specific RMSEs - would be very helpful.
MovieID Scores SqErr RMSE %Total
2152 9979 8612.3465 0.9290 0.7610 What Women Want
3151 5287 8215.4221 1.2466 0.7259 Napoleon Dynamite
3864 12056 7156.1952 0.7704 0.6323 Batman Begins
13255 8142 7082.5500 0.9327 0.6258 Crash
1307 8154 6641.5898 0.9025 0.5868 S.W.A.T.
3282 5755 6420.6328 1.0562 0.5673 Sideways
1145 8988 6196.8724 0.8303 0.5475 The Wedding Planner
5991 5279 6004.4476 1.0665 0.5305 Sin City
528 5216 5394.2187 1.0169 0.4766 The Hitchhiker's Guide to the Galaxy
6255 6765 5337.6030 0.8883 0.4716 Bewitched
313 5859 5215.5594 0.9435 0.4608 Pay it Forward
5239 7531 5048.7674 0.8188 0.4461 The Longest Yard
2913 5759 5023.0223 0.9339 0.4438 Finding Neverland
5317 5229 4907.2340 0.9687 0.4336 Miss Congeniality
11812 6209 4859.4754 0.8847 0.4294 Million Dollar BabyBy the way, the overall RMSE for this particular set was 0.8964. Anybody else want to post this sort of list?
Offline
It is an interesting experiment to evaluate, which movies contribute most to the error on the probe set. The table of the top-15 is sorted descending against the squared error. There are many overlaps to the list provided by Clueless, it seems that movie "Napoleon Dynamite" is one of the hardest nuts for many. A predictor with 0.8688 RMSE is used in order to generate this evaluation.
The curious thing is that there are 152k ratings in the training set for movie "What Women Want", nevertheless it produces the largest error on the probe. Note the average number of movie ratings on the dataset is only 5k.
MovieID #Probe #Train SqErr RMSE Title 2152 9979 152618 8112.24 0.9016 What Women Want 3151 5287 111075 7877.39 1.2206 Napoleon Dynamite 3864 12056 42866 6839.61 0.7532 Batman Begins 13255 8142 56932 6759.22 0.9111 Crash 1307 8154 113053 6308.31 0.8796 S.W.A.T. 3282 5755 111515 6054.59 1.0257 Sideways 1145 8988 131166 5636.07 0.7919 The Wedding Planner 5991 5279 50868 5511.98 1.0218 Sin City 6255 6765 16801 5173.82 0.8745 Bewitched 528 5216 23238 5109.49 0.9897 The Hitchhiker's Guide to the Galaxy 313 5859 93953 4853.52 0.9102 Pay It Forward 2913 5759 101684 4801.18 0.9131 Finding Neverland 5239 7531 45632 4769.6 0.7958 The Longest Yard 11812 6209 96652 4587.95 0.8596 Million Dollar Baby 1962 5878 139641 4481.51 0.8732 50 First Dates
and the table RMSE sorted
MovieID #Probe #Train SqErr RMSE Title 3151 5287 111075 7877.39 1.2206 Napoleon Dynamite 14890 1758 46354 2584.74 1.2125 Team America: World Police 11022 1405 101700 1998.33 1.1926 Fahrenheit 9/11 5695 1322 64308 1698.17 1.1334 Bad Santa 4266 1729 83321 2157.15 1.1170 The Passion of the Christ 7635 2494 104589 3088.61 1.1128 Anchorman: The Legend of Ron Burgundy 14454 1115 139449 1352.93 1.1015 Kill Bill: Vol. 1 14274 1372 47666 1635 1.0916 I Heart Huckabees 12232 1326 151080 1527.44 1.0733 Lost in Translation 361 2159 33690 2395.84 1.0534 The Phantom of the Opera: Special Edition 897 1182 11174 1293.72 1.0462 Bride and Prejudice 16640 2532 79349 2685.29 1.0298 Closer 3333 2871 86843 3026.49 1.0267 The Village 3282 5755 111515 6054.59 1.0257 Sideways 5991 5279 50868 5511.98 1.0218 Sin City
BigChaos
Last edited by ch (2008-11-23 13:27:51)
Offline
I generated my own list using chef-ele and Cluless criteria. The lists are surprisingly similar. First the movies with the highest average error with at least 1000 ratings in the probe set (top 16):
MovieID Ratings SquaredErr RMSE %Total %Cumul Title
3151 5287 7881.16 1.2209 0.7385 0.7385 Napoleon Dynamite
14890 1758 2588.89 1.2135 0.2426 0.9811 Team America: World Police
11022 1405 1993.89 1.1913 0.1868 1.1679 Fahrenheit 9/11
5695 1322 1687.58 1.1298 0.1581 1.3260 Bad Santa
4266 1729 2173.17 1.1211 0.2036 1.5297 The Passion of the Christ
7635 2494 3089.54 1.1130 0.2895 1.8192 Anchorman: The Legend of Ron Burgundy
14454 1115 1353.40 1.1017 0.1268 1.9460 Kill Bill: Vol. 1
14274 1372 1663.91 1.1013 0.1559 2.1019 I Heart Huckabees
12232 1326 1529.96 1.0742 0.1434 2.2452 Lost in Translation
361 2159 2381.09 1.0502 0.2231 2.4684 The Phantom of the Opera: Special Edition
897 1182 1301.13 1.0492 0.1219 2.5903 Bride and Prejudice
16640 2532 2710.60 1.0347 0.2540 2.8443 Closer
3333 2871 3043.02 1.0295 0.2851 3.1294 The Village
5991 5279 5551.56 1.0255 0.5202 3.6496 Sin City
3282 5755 6048.85 1.0252 0.5668 4.2164 Sideways
4634 1394 1439.30 1.0161 0.1349 4.3513 Me and You and Everyone We KnowSecond, the list of movies contributing the most to the total error (top 15):
MovieID Ratings SquaredErr RMSE %Total %Cumul Title
2152 9979 8177.99 0.9053 0.7663 0.7663 What Women Want
3151 5287 7881.16 1.2209 0.7385 1.5048 Napoleon Dynamite
3864 12056 6871.42 0.7550 0.6439 2.1486 Batman Begins
13255 8142 6747.89 0.9104 0.6323 2.7809 Crash
1307 8154 6357.27 0.8830 0.5957 3.3766 S.W.A.T.
3282 5755 6048.85 1.0252 0.5668 3.9434 Sideways
1145 8988 5653.13 0.7931 0.5297 4.4731 The Wedding Planner
5991 5279 5551.56 1.0255 0.5202 4.9933 Sin City
6255 6765 5186.35 0.8756 0.4860 5.4793 Bewitched
528 5216 5094.97 0.9883 0.4774 5.9567 The Hitchhiker's Guide to the Galaxy
313 5859 4878.63 0.9125 0.4571 6.4138 Pay It Forward
2913 5759 4809.67 0.9139 0.4507 6.8645 Finding Neverland
5239 7531 4799.45 0.7983 0.4497 7.3142 The Longest Yard
11812 6209 4589.02 0.8597 0.4300 7.7442 Million Dollar Baby
1962 5878 4501.38 0.8751 0.4218 8.1660 50 First DatesLast edited by PragmaticTheory (2008-11-23 12:27:08)
Offline
I haven't seen BigChaos' posting before writing mine. Wow: a perfect match!
Offline
There's obviously a lot of agreement. My numbers come from the probe set posted at http://www.ofadifferentkind.com/probe.1 … 08.txt.zip. The probe score is 0.8815 and the qualifying score is 0.8767.
ID RMSE SE Count Title 2152 0.9124 8307.04 9979 What Women Want 3151 1.2358 8074.00 5287 Napoleon Dynamite 3864 0.7574 6915.61 12056 Batman Begins 13255 0.9174 6853.09 8142 Crash 1307 0.8904 6464.53 8154 S.W.A.T. 3282 1.0386 6207.39 5755 Sideways 1145 0.8057 5834.46 8988 The Wedding Planner 5991 1.0356 5662.09 5279 Sin City 6255 0.8805 5244.95 6765 Bewitched 528 0.9999 5214.58 5216 The Hitchhiker's Guide to the Galaxy 313 0.9199 4958.22 5859 Pay It Forward 5239 0.8014 4836.13 7531 The Longest Yard 2913 0.9149 4820.56 5759 Finding Neverland 11812 0.8633 4627.76 6209 Million Dollar Baby 1962 0.8802 4553.90 5878 50 First Dates
Offline
Some of those movies also tend to show up in a cluster based on similarity. For instance, here are the top 30 movies computed as being most similar to Napoleon Dynamite
http://gflix.appspot.com/netflix/3150
You can click on the individual posters to browse through to a list of similar movies for that movie.
Offline
I think these results above seem to follow with my preliminary observation that binning on a movie-by-movie basis for blending is not all that productive.
Offline
Aron wrote:
I think these results above seem to follow with my preliminary observation that binning on a movie-by-movie basis for blending is not all that productive.
I tried binning by movie a while back, too, and had very poor results. Binning by user is, as expected, even worse (but I had to try it). After taking a couple of months off - there were just too many other things to do - I'm back at work on my GA-based mixer and some other odds and ends. I've got the mixer to a point where it can do slightly better than non-binned linear combination, and preliminary test results from my most recent version look promising. Don't get me wrong, I don't think this is a contest-winning strategy by any means, but it might shave enough off of my qualifying score to push me a bit closer to the top - without having to resort to implementing things that I don't really understand.
Offline
Clueless wrote:
I'm back at work on my GA-based mixer and some other odds and ends. I've got the mixer to a point where it can do slightly better than non-binned linear combination, and preliminary test results from my most recent version look promising.
Interesting. I tried a GA-based mixer too for a few days, using a small number of files, but it never out-performed my linear regression mixer, not to mention it took forever to run.
Last edited by Newman! (2008-11-24 11:13:57)
Offline
My first attempt did poorly - took forever and never outperformed linear regression. My newest variation uses a combination of Particle Swarm Optimization (PSO) and Genetic Programming (GP) rather than classical GA. The tricky part, as always, is coming up with a good way to have the "critters" model the problem space.
Offline
I registered just to throw this newbie suggestion.
Perhaps Netflix could deal with the "Napoleon Dynamite" problem by giving customers two sorts of recommendations:
1) It is extremely probable that you will like this movie.
2) You may adore this movie or you may hate it; you're unlikely to be indifferent. We can't predict which. Take a chance!
As a customer, I could pick 1) or 2) depending on my mood and my willingness to experiment.
If you can't solve the problem, segregate it.
Offline
An other interesting question is which movies are well predictable. Here is a table of the movies, which have at least 1000 rating on probe, sorted ascending on the RMSE they have. It seems that movies with high average rating are easy to predict (average train rating TrAvg). This experiment is done with the same probe prediction as posted before.
MovieID #Probe #Train SqErr RMSE TrAvg Title 7057 1282 73630 231.748 0.4252 4.7017 Lord of the Rings: The Two Towers: Extended Edition 7230 1148 72274 253.918 0.4703 4.7163 The Lord of the Rings: The Fellowship of the Ring: Extended Edition 5293 1103 88245 331.948 0.5486 3.9345 Patriot Games 5582 1283 91187 433.416 0.5812 4.5441 Star Wars: Episode V: The Empire Strikes Back 3610 1859 73289 633.575 0.5838 3.8070 Lethal Weapon 3 14550 1848 137812 640.339 0.5886 4.5931 The Shawshank Redemption: Special Edition 13673 1113 50193 386.392 0.5892 4.3488 Toy Story 1798 4553 108824 1655.99 0.6031 3.9637 Lethal Weapon 2452 1934 147932 724.566 0.6121 4.4339 Lord of the Rings: The Fellowship of the Ring 270 1815 34616 684.529 0.6141 4.2958 Sex and the City: Season 4 3290 1326 70288 552.69 0.6456 4.4039 The Godfather, Part II 3456 1491 5758 626.73 0.6483 4.6784 Lost: Season 1 3962 1929 139050 880.2 0.6755 4.4151 Finding Nemo (Widescreen) 1476 1214 10615 555.013 0.6761 4.4653 Six Feet Under: Season 4 241 1250 41872 571.806 0.6763 4.1568 North by Northwest
BigChaos
Offline
thx much to the teams sharing info .. these are very fascinating results.
there is an early post where someone plots the standard deviation of the ratings versus the average movie rating. maybe someone else can link that up to this thread if they recall it. as I recall, typically the std. dev. is low for low rated and high rated movies. leading to a "frown" in a graph of avg movie rating on x axis and std dev on y axis. that seems to explain (most of?) BigChaos's observation above that high rated movies are easier to predict.
as I read these results, there are several possible conclusions.
a) after a certain level of high tuning, all the algorithms are "sucking" the same signal out of the data, and it doesnt really matter what algorithm is used; moreoever, the signal limit is a constant for each movie. this is esp true the more the different algorithms are independently built/designed. also, the contest may be nearing this theoretical limit.
b) the top teams may be sharing algorithms somewhat freely, in which case each algorithm is not so independent. (maybe indirectly/inadvertently, by reading/replicating results published in the conf papers)
(a), (b) are somewhat opposite findings (based on dependent vs independent algorithms) yet some combination of (a) + (b) is probably the case.
Last edited by vzn (2008-11-27 13:53:41)
Offline
vzn wrote:
(a), (b) are somewhat opposite findings (based on dependent vs independent algorithms) yet some combination of (a) + (b) is probably the case.
Right on, there is likely an inherent randomness (case a) for each movie and nobody/nothing can beat this.
Let's hope that there is enough left of the case b margin to allow for more progress, though this progress will not come from "improvements" by the very definition of case b.
Offline
I have a question, but since my understanding of statistics and programming is pretty basic, it could sound pretty foolish. From the New York Times article, I inferred that the current algorithms are capable of ferreting out small pieces of information about movies and then predicting a user's rating from how much they appreciate those factors. This sounds similar to a regression where I input all of the factors, genre, release date, actors, themes, etc., but, I guess the algorithm does it without more inputs.
Do these algorithms account for the variability of the ratings themselves? Is it possible to use the variability of the users' ratings for a particular movie as a predictor? For example, Napoleon Dynamite has a high variability in ratings, users who rated ND highly might also apply this "X" factor to Sideways.
Offline
ADifferentName wrote:
There's obviously a lot of agreement. My numbers come from the probe set posted at http://www.ofadifferentkind.com/probe.1 … 08.txt.zip. The probe score is 0.8815 and the qualifying score is 0.8767.
Very impressive score! What algos are you blending?
Offline
Hi. That particular submission was a blend of 187 files. Most of those files probably contributed very little to the final rmse.
At that time, I was using the results from two open source projects and five different matrix factorization algorithms. The open source projects were nprize and Kadence's kNN.
The five flavors of matrix factorization that I used were:
1.) NSVD1 implemented as best as I could from Gravity's paperA Unified Approach of Factor Models and Neighbor Based Methods for Large Recommender Systems.
2.) BRISMF describe in Gravity's paper Investigation of Various Matrix Factorization Methods for Large Recommender Systems.
3.) A hybrid NSVD1/BRISMF describe in Gravity's paper "A Unified Approach..." (the same paper where the NSVD1 implementation is described).
4.) SVD with simultaneous factor updates (described in many posts on the Netflix forums).
5.) SVD++ from BellKor's paper Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model.
I never really got BellKor's SVD++ to work, so I used my own concoction of mixing NSVD1 with BRISMF with some cross training of the factors. I tried to integrate the ideas of the SVD part of SVD++ into code that I already had working.
And I mixed them together using Gravity's linreg linear regression program.
I guess if you follow team Gravity around and try to pick up bits and pieces from them, you can end up with a decent score :-)
Good luck!
Greg
Offline
Pages: 1