Forum for discussion about the Netflix Prize and dataset.
You are not logged in.
I finally bit the bullet and implemented the global effects as described by BellKor (that was a lot of tedious coding). However I found there seems to be no one best set of weights, I could get more or less the same results with very different weights, and no, blending them doesn't improve the final RMSE much. Anyone else has similar results ?
Does this mean there are other undiscovered effects lurking around ?
Last edited by Newman! (2008-01-24 00:37:49)
Offline
I got very similar results to those published in the paper by Bell and Koren. By "weights," are you referring to shrinkage parameters? If so, check out http://www.netflixprize.com/community/v … php?id=772.
(edit: sorry, I had misread the OP)
Last edited by Paeva (2008-01-24 13:46:04)
Offline
Newman! wrote:
I finally bit the bullet and implemented the global effects as described by BellKor (that was a lot of tedious coding). However I found there seems to be no one best set of weights, I could get more or less the same results with very different weights, and no, blending them doesn't improve the final RMSE much. Anyone else has similar results ?
Does this mean there are other undiscovered effects lurking around ?
Newman
There are many other global effects waiting to be discovered. My best score using only global effects is .9503 (on the quiz dataset) - which is better than the score reported by Netflix at the beginning of the contest after they had performed their collaborative filtering.
The secret to finding them is to think carefully about what is actually going on. So, for example, when calculating the means it is important to realise that just taking the average score on the training set is biased. What I mean by this, is that some customers may only have seen lousy movies and so they have a lower average score. This will give the impression that they tend to mark downwards. However, if they had seen averagely good movies in the training set, their average score would be higher. If you take just this effect into account then you can reduce the quiz score using movie and customer means from .9826 to .9799.
More programming I'm afraid.
Gavin
Offline
Dan, yes I could duplicate the results reported by BellKor and Bored Bitless.
Gavin, do you care to share what other global effects you have used ?
Personally, I thought of 2 additional effects:
1. Day of the week effect: for example, do people tend to rate higher on Saturdays and lower on Mondays ? I actually coded this but it only improved my probe RMSE by 0.0001. Similarly, there might be Christmas day effect, new year's day effect, Valentine's day effect, etc. Due to large number of ratings for these effects, I think shrinkage is unnecessary.
2. Menstruation effect: women's ratings are affected by their mood, which varies with their menstrual cycles. To detect this reliably, we need large number of ratings over long period of time for each viewer, which we don't have, so shrinkage should be applied. I haven't implemented this yet, but it sounds just politically incorrect enough to worth a try.
Offline
I played around with this a little myself and found a couple of useful "global effects". I agree with Gavin's observation that a little cognitive modeling is very helpful.
Consider this: What is the one piece of information that a person can't avoid seeing when rating a movie? The average rating of the movie at that point in time.
And this: Who will be "lined up" to rate a movie that's just been released; people who will probably like the movie, or people who will probably dislike it? And a corollary: movies (those with a decent number of ratings anyway) that were "released" during the period covered by the training data (i.e. not oldies) exhibit a ratings "arc" (I'm not using the geometric definition of the term of course).
Of course I'm leaving the implementation details as an exercise... LOL
P.S. If anybody cares, I now have an unblended method that scores < 0.9 on the quiz. Yay!
Offline
gavin wrote:
So, for example, when calculating the means it is important to realise that just taking the average score on the training set is biased. What I mean by this, is that some customers may only have seen lousy movies and so they have a lower average score. This will give the impression that they tend to mark downwards. However, if they had seen averagely good movies in the training set, their average score would be higher. If you take just this effect into account then you can reduce the quiz score using movie and customer means from .9826 to .9799.
Gavin
As I recall in the Bellkor global effects approach, each effect is trained on the residuals from the step before it. So, for example, if the first effect is the "movie mean" effect, then the second effect - the main customer effect - will be calculated on residuals from the first effect.
This approach should already address the point you are making (if a customer has rated lousy movies then the raw mean of his ratings could lead you to the false impression that they "mark downwards").
Offline
Newman! wrote:
Dan, yes I could duplicate the results reported by BellKor and Bored Bitless.
Gavin, do you care to share what other global effects you have used ?
To be honest, its not so much the effects as an underlying approach of trying to think what is really going on in the situation that helps. You just can't get everything just from performing a straight statistical analysis.
I think there are two levels at which this can be approached, so to take your idea about "feel good days" and "feel bad days" which is a good one. Can we identify days (days of the week etc) when the ratings tend to be higher/lower. Does it make a difference? There is then a second level - are there movies that are more susceptible to the days of the week effect than others (maybe popular (as measured in number of ratings) versus others) , does the day of the week effect decay over time - maybe initially when the enthusiasts are watching it doesn't make much difference, later on it has a bigger impact etc etc).
You can squeeze quite a bit more out of the score just by modelling all these effects, one tiny step at a time...
Offline
gavin wrote:
To be honest, its not so much the effects as an underlying approach of trying to think what is really going on in the situation that helps. You just can't get everything just from performing a straight statistical analysis.
I couldn't agree more. You can go pretty far using a variety of "blind" numerical optimizations and/or statistical analyses, but you'll eventually hit a wall. After that you need to start thinking about what makes the people who rate movies tick. I like your idea about "feel good" and "feel bad" days (from a global perspective) - I think I'll take a look at the ratings for the weeks following 09/11/01 (and possibly some other important dates) just to see if there's a discernible pattern.
Then I'm back to my "blind" numerical approach. I haven't hit that wall yet.
...one tiny step at a time...
Yes. Tiny. The "no free lunch" rule clearly applies.
Offline
Gavin, I am very impressed by the RMSE=0.9503 achieved by pure global effects.
Global effects are neat and intuitive, unlike the latent factor models (SVD and the likes) that by definition seek hidden, less explainable factors.
However, I don’t expect that improving global effects will have much impact on the final RMSE. Latent factor models were proven much more effective at recovering the same information and much beyond. Of course, I might be wrong about this...
Yehuda
Offline
Clueless wrote:
- I think I'll take a look at the ratings for the weeks following 09/11/01 (and possibly some other important dates) just to see if there's a discernible pattern.
That would be very interesting - perhaps we could find some mood indicator (the change in stock prices?) and correlate ratings to that. The beauty with this dataset is that its so large, that it is possible to discern trends that would be undetectable in smaller datasets -I think we need to persuade more psychologists involved in the competition to complement the mathematical approach.
I should declare that I am a psychologist - running as "Just a guy in a garage". Are there any others out there?
Gavin
Offline
gavin wrote:
perhaps we could find some mood indicator (the change in stock prices?) and correlate ratings to that
Excellent idea ! I'll try an S&P 500 index effect. This is much better than my menstruation index idea. If this works, Netflix might be able to make billions by predicting the stock market based on movie ratings.
Yehuda, one thing you didn't mention in your paper is how you determine the order of the effects ?
Last edited by Newman! (2008-01-25 14:26:45)
Offline
YehudaKoren wrote:
I don’t expect that improving global effects will have much impact on the final RMSE. Latent factor models were proven much more effective at recovering the same information and much beyond. Of course, I might be wrong about this...
Judging from my (haphazard and fledgling) experiments I have to agree - although feeding the global-effects-adjusted ratings to even a half-decent nearest-neighbor implementation gives decent results.
Personally I have been playing with "global effects" primarily to get a better gut feel of the data - plus it's kinda fun.
Offline
Newman! wrote:
Excellent idea ! I'll try an S&P 500 index effect. This is much better than my menstruation index idea. If this works, Netflix might be able to make billions by predicting the stock market based on movie ratings.
On thinking about this - most countries have some form of consumer confidence index - that really might be predictable from the ratings data (or vice versa). I suspect it would be easier to fit than the S&P.
Offline
Based on this discussion, it seems like there's a whole lot of small psychological factors out there to consider...
I remember finding that the closer you get to the weekend, the higher the average rating of the movies rated. The effect was small (~0.01stars) , but I'd suspect it also has a psychological basis (at least for me
)
And maybe a weather index might be useful, too. There was a study from a while back that showed that the stock market returns have a weak but statistically significant correlation with the amount of sunshine. (see this article ) The theory was that sunshine makes people slightly more optimistic, and therefore slightly more bullish. But optimism could also boost movie ratings...
Back to the stock market index idea: It might also be useful to use the percentage change in the index over a short period (1 day, 1 week, 1 month), rather than the index itself. It's the swings in the market that make the news & make people nervous, not the value of the index itself. Anyway, there's lots of free historical market data at Yahoo, for example, this.
And you know, I wonder if only people who watched "Wall Street" might show this 'stock market effect' !
Last edited by chef-ele (2008-01-26 07:01:02)
Offline
chef-ele wrote:
And you know, I wonder if only people who watched "Wall Street" might show this 'stock market effect' !
I agree. At the moment all of the models I use assume that these 'psychological' effects apply to everyone equally. I think you could probably get a better result if you took into account that some people are more influenced by different effects than others. I'm just starting to try and figure a way of modelling this - I don't have an answer yet.
Gavin
Offline
chef-ele wrote:
I wonder if only people who watched "Wall Street" might show this 'stock market effect' !
I was curious about this, so I did a very quick-and-dirty analysis of S&P500 monthly returns vs the average monthly movie ratings of two movies: "Wall Street" and "Broadcast News". I picked Broadcast News because it has a different theme (& perhaps audience) than Wall Street, so it would provide some contrast. It also had a similar number of ratings (~20k) & came out in the same year (1987).
A plot of the results is here. It's hard to tell much from the plot, but the correlations between the monthly SP500 market returns & "Wall St" ratings was indeed higher than the correlation with "Broadcast News" (0.211 for Wall St, 0.179 for Broadcast News). But there were only 63 months of data that I could use, so its not a statistically significant difference.
I was surprised to see that both movies had correlations of around 0.2 ; it's not high, but I thought it might be much closer to zero. If I do some Fisher-transforms & statistical tests, both correlations are different from zero at a >90% confidence level. Still, I wouldn't have thought both movies might be correlated with the stock market...maybe I should try a childrens movie next. Or perhaps the market might be a proxy for something else (time, etc.). Interesting...
Last edited by chef-ele (2008-01-26 17:05:00)
Offline
chef-ele wrote:
I was surprised to see that both movies had correlations of around 0.2 ; it's not high, but I thought it might be much closer to zero. If I do some Fisher-transforms & statistical tests, both correlations are different from zero at a >90% confidence level. Still, I wouldn't have thought both movies might be correlated with the stock market...maybe I should try a childrens movie next. Or perhaps the market might be a proxy for something else (time, etc.). Interesting...
Or perhaps some of the factors that influence the stock market also influence movie ratings? If so, the type of movie shouldn't matter much. Interesting.
Offline
Newman! wrote:
Yehuda, one thing you didn't mention in your paper is how you determine the order of the effects ?
The order that we used is the one shown in the table within the paper (from top to bottom). It is quite arbitrary. Of course you want to start with the main effects, as they are so strong and overwhelm anything else. Beside this, other orders might have worked better.
In fact, there is a more fundamental approach to this that enable squeezing extra points from the global effects, but I'm pessimistic about any effect this would have on the final RMSE, so why bother with this ![]()
Yehuda
Offline
chef-ele wrote:
I was curious about this, so I did a very quick-and-dirty analysis of S&P500 monthly returns vs the average monthly movie ratings of two movies: "Wall Street" and "Broadcast News".
I was surprised to see that both movies had correlations of around 0.2 ; it's not high, but I thought it might be much closer to zero. If I do some Fisher-transforms & statistical tests, both correlations are different from zero at a >90% confidence level. Interesting...
That was impressively quick.
I also agree the result is both surprising and interesting.
Offline
After a somewhat longer analysis, I'm less sure about how valid these stock market correlations are. If you look at the average of all ratings submitted in a given week & the SP500 for each week, you get a plot like this.. You can see how the major trends of the S&P (falling then rising) line up with the rating trends (rising slowly, then quickly). This creates a net positive correlation, but they're both just a function of time. So using the average rating around a certain date would probably be a better predictor than the SP500.
For the correlation between [%change in the SP500 in the week] vs. [average of ratings submitted in the week], there's the problem of how to deal with the rapid shift in the mean rating around 1Q2004. Generally, I've found it created some misleading correlations,
To get around the shift in the mean rating, I tried using the CHANGE in the average rating. I calculated the correlation of the [%change in weekly avg rating] with [% change in SP500], but found only about 0.02 correlation, which doesn't seem all that statistically significant. Oh well.
Offline
Has anyone seen the following effect? I've been noticing that the more ratings a user submits on a certain date, the higher those ratings are likely to be. For example, averaging across all rating submissions, if a user submits 1 or 2 ratings in a given day, they'll average a bit more than 3.5 stars. But if a user submits 5 or more ratings on a given day, in those scenarios the ratings average out to be 3.6 stars or more.
Has anyone else noticed this and/or taken advantage of this? Has it been discussed already? It seem like a small effect so I'm wondering if it's worthwhile to pursue (or if my SVD would have already found it).
Last edited by chef-ele (2008-01-28 13:00:11)
Offline
YehudaKoren wrote:
In fact, there is a more fundamental approach to this that enable squeezing extra points from the global effects, but I'm pessimistic about any effect this would have on the final RMSE, so why bother with this
Yehuda,
Please tell us what this fundamental approach is. I managed to get my global effects RMSE down to 0.9534. I basically added a viewer-week-day effect and re-computed the effects a few times. As you mentioned earlier, this doesn't help SVD at all, but helps KNN a lot. Running my KNN on the residue reduced its RMSE by 0.005. So I'm very intrigued by all sorts of global effects now.
Offline
Paeva wrote:
I would be curious to find how well these improved global effects + kNN methods blend with everything else. Not that this is an easy thing to quantify.
So far my combined blending improvement is about 0.002 on probe data, from 0.8954.
Last edited by Newman! (2008-01-29 21:05:39)
Offline