Forum for discussion about the Netflix Prize and dataset.
You are not logged in.
Hi, I have been trying to understand the stochastic gradient descent method used to solve the SVD like model proposed by Simon Funk and others after him. I would like to understand it with a real time case so as to facilitate better understanding.
Consider that there is a new user who provides his first rating for a particular movie. Now, based on this, I would try and learn the user features using stochastic gradient descent and minimising the mean squared error on all known ratings. Have I got it right so far??
Now, I can use the features to generate a prediction for him. Assume he rates another movie thereafter. Now, with this new rating, would I try to modify his features further or stop at the first attempt and assume that those are the user's final features. So,basically, applied to the netflix dataset, I guess I want to know if after training a particular user's features using his first rating, would I continue doing so for every single rating of his in the dataset so as to best capture his preferences. (Note: I am not taking into account temporal effects at this point).
Someone kindly help me out as I have been struggling to understand this for the past week. Thanks in advance.
Avinash
Offline
You would train the features on all the user ratings available. Clearly, the training works better for users who have rated a lot of movies. If the contest only included users who rated 80 or more movies, it would have been over quickly.
An interesting aside, would have been to investigate whether this difference is purely due to the number of user ratings available, or whether users who rate more movies are inherently more consistent/predictable in their ratings. This could be tested by only training on the first 'n' ratings by any user and see if the accuracy of your guesses differs based on total ratings by each user.
Since users with very few ratings are more difficult to predict, some algorithms average the user based predictation with an "average" user (usually weighted based on the number of ratings the user made).
In a real-life situation, you might want to retrain your features as the user adds ratings.
-quadcore
Offline
Thanks quadcore, that makes sense. And in that case, it brings into question another important point which been wanting to discuss. First of all, I want to know if for checking convergence, is the root mean squared error of all known ratings in the dataset considered or only the ones for which training has been done. I would tend to think its the latter as in a real-time scenario, the ratings appear in a sequence and ideally, I would like to train or retrain a user's and movie's features as soon as a rating is provided by the user for that movie. Is this thought process right??
Also, more importantly, if I have to write an algorithm for this, everytime I update, say a particular user's features using a rating provided by him and want to check for the convergence of the RMSE, if what I have assumed above is true, then the predicted rating, and consequently the error would be updated for not just for that particular rating but also for any other rating provided by the same user in the set of ratings for which RMSE is computed so as to check for convergence.
Kindly let me know if my line of reasoning is correct. Thanks in advance.
Avinash
Offline
There are probably multiple ways to quickly recompute the SVD after adding a new rating. So depending on the approach you take, the set of predictions & ratings used to evaluate convergence might differ.
For example, if you wanted the highest accuracy regardless of speed, you'd need to retrain all user and movie features. The (tiny) impact of a new user rating would ripple through all the features, and thus all predictions. Therefore, you'd have to evaluate RMSE & convergence using the whole dataset. This could be slow, but then again it might not be --- you'd be starting from a set of features you converged on previously, and the features probably don't change much when you add a small number of new ratings, so re-convergence might not take long at all.
Alternatively, to speed things up, you might just train only the new user's features. This might be justified since adding a few more ratings from a new user would probably perturb the existing features by a miniscule amount, which you could round to 0. With this approach, as you said, you'd check for convergence using only the new user's prediction(s) & rating(s). This is faster to evaluate, but the predictions will be less accurate in part because the movie features & other user features won't incorporate the information contained in the new ratings you've added. If inaccuracies built up over time as you add more ratings, you could 'correct' the features by recomputing all features, as quadcore hinted above.
Finally, after googling "incremental SVD," I just found this paper about an incremental SVD algorithm. It's yet another approach that can "fold in" new data into an existing pre-computed SVD (see section 3 and figure 1). I haven't tried it, but it looks promising, too.
Last edited by chef-ele (2010-01-25 11:41:56)
Offline