Netflix Prize: Forum

Forum for discussion about the Netflix Prize and dataset.

You are not logged in.

Announcement

Congratulations to team "BellKor's Pragmatic Chaos" for being awarded the $1M Grand Prize on September 21, 2009. This Forum is now read-only.

#1 2009-07-26 07:28:23

Phil
Member
Registered: 2006-10-09
Posts: 132

My best and worst SVD tricks

I worked out a long list of tweaks to my SVD over the past few years.  Never enough to bring it down to the numbers that other people were reporting, but down to below .91 on the probe using 1/5 of the training set and 40 features.  (I was very frustrated that the published results indicated basic SVDs getting RMSEs around .91; in my experience, implementing those reports, for instance a basic Funk, got around .93 on the full training set with 40 features.)

What worked best:

- Divide training error by a standard deviation associated with that sample.  This made a big difference, more than .005.  The reasoning is that when your predictor is confident, you adjust it more if it makes an error.  You can try all sorts of ways of estimating a standard deviation, but what worked best for me was

calculate variance uvar of each user's ratings
calculate variance mvar of each movie's ratings
var = mvar*uvar / (mvar + uvar)

- Start off training with the users and movies with the most ratings.  This avoids locking in "inversions" in your features, where you get some movies and users that have feature values that are the negative of what they ought to be, and they keep each other stuck in that local optimum.  At least, that was the idea I had when I implemented it, and it worked really well.  Again, this made almost a .005 difference.  I set a threshold for number of ratings on the first epoch, and increased it linearly so that all the ratings were first included on the 20th epoch.  Note that increasing the threshold linearly increases the number of ratings used more than linearly.

- Regularize towards the feature mean value instead of towards zero.  This applies only if you have a non-zero mean feature value.  I think that you want to have a positive mean feature value, because you are after all trying to do things to a matrix that can be reliably done only to positive definite matrices.  You might not be able to compute your "SVD" as well if the matrix you're approximating is non-invertible.  Also, to keep things stable, I calculated the expected feature mean from the feature mean that I mandated at feature initialization time, rather than tracking the actual feature mean and trying to regularize towards that.

- Training all features simultaneously.  I think this is actually performing linear regression by gradient search, and so shouldn't be called PCA or SVD.

- Boosting: Near the end of a run, train more often on samples with higher error.  (Let the chance of training on the error be proportional to the error.)  This also probably makes the run a little bit faster, at least if you precompute your random numbers and re-use them.

- Train only on users who are in the test set.  Simple, but it worked for me, on the probe at least.

- Save the previous epoch's feature values, to go back to if the RMSE goes up (while lowering the learning parameter).  This helps eke out a few more epochs at the end of a run.

- A little simulated annealing: Add a little noise to the features, diminishing as you go on.  This was not a big effect.  I also computed an error term for each feature, which was the sum of products of feature value times training error on each rating, and added noise proportionally to that error.  The difference in error terms from smallest to greatest was only, IIRC, something < 10%.


Things that I never got to work that should have worked:

- Simon Funk pointed out a regular deviance in predictions, whereby there is a function mapping the prediction output into expected error that is not y = 0.  It's very strange that this function can look so statistically significant, and yet fail to generalize beyond the sample you train it on. Simon said he got it to work.  I never did.

- Working on saturation effects of features.  By this I mean that having one feature that a movie scores a 3 on should not be 3 times as valuable as having 3 features that a movie scores a 1 on.  I expect qualities along single dimensions to have diminishing returns to viewers.  I tried grouping predictions according to the sum of squares of their features, so that predictions based on very uneven feature*feature sums would be in different groups, and learning functions for each group to map prediction into error.  I got functions that were statistically significant at outrageously high levels, yet failed to generalize to the test set.

- Attempts to get at nonlinear interactions using the feature vectors.  When you take the dot product of your movie vector and your feature vector, instead of just adding the terms up, you should be able to take that product vector and throw it into a classifier and train it on the residual between prediction and error.  I tried doing that using the FANN neural network library, but I don't know whether I got it to work.  The results varied.  For most settings, postprocessing with the neural network made the RMSE higher.  It made it lower only when working on subsets of the full rating set.  Strange.  I also tried to use a support vector machine for this, but never got any positive results, and it was much too slow.  You could look into using SVMperf, a linear-time SVM (but I think it can use only linear kernels, and it doesn't do regression.)

- Normalizing ratings in different ways.  I had high hopes for moving ratings onto a scale normalized by converting a rating into the probability that another rating by the same user was less than that rating (actually, p(r2 < r1) + .5p(r2 = r1) ).  This made results worse, I think because you then end up giving more emphasis to error adjustments that end up accounting for less of your RMSE.  (If 2 => .05, 3 => .5, 4 => .7, then suddenly getting those 2's, now those .05's, right looks much more important to your program than getting those 4's right.)  I also looked into Rasch analysis, but never got it to work.

- Adjusting predicted ratings to produce the minimum expected mean square error, using the standard deviation specific to your prediction (see above) and the priors on the ratings.  Mathematically, there are situations where you should round your prediction off a little bit towards the nearest integer in order to reduce MSE; yet it didn't work for me.

- Coln Green ("Bored Bitless") noticed a very large (.1 to .2) shift in the ratings at a particular point in time, when the rating instructions on the website changed. I never got the adjustment for that to contribute as much as I thought it ought to.

- SVD++. [EDIT: Okay, I believe it works.  I couldn't get it to work.]  I've found that almost nothing published during the course of this contest was reproducible by me, certainly not SVD++.  It's a user bias, and user biases don't work for me.  It's so unstable that Bell & Koren must have had to tweak it extensively, and I suspect that it was the tweaking rather than the equations that resulted in the improvement.

- Parallelization.  I reasoned that, if I wanted to run on n processors, I could divide the user into n groups and the movies into n groups, so that the user x movie matrix would look like an n x n grid.  At any one time, you work on n different squares in that grid, chosen so that no two squares are in the same row or column; it takes n steps with n processors to process the entire set once.  For instance:  movie_block = thread number; user_block = (step + thread) % n.  I implemented this with OpenMP, but it gave lousy numeric results, and I never figured out why.  If it was just parallelizing the loop I wanted to parallelize, the computation results should have been the same.  I suspect that OpenMP, which lets you tell it what to parallelize, but not what not to parallelize (due to a strange semantic restriction that does not let you specify what to do with the loop immediately outside or inside another loop that you've already specified what to do with), is parallelizing other loops besides the one that I want it to.

- Training using the probe set.  Everybody else said their RMSE went down .05 to .1 when they added in the probe set.  I had to use the probe set because I monitored how to set the learning rate very closely, and the only good way to set the learning rate that I found was as a function of RMSE.  In the last few days, my RMSE went to hell when I tried to put most of the probe into the training set, because my estimate of RMSE was no longer accurate to 6 decimal places.  I probably should have trained without the probe, recorded the sequence of learning rate adjustments, and re-used it training with the probe.

[EDITED] Speaking of which, I wish people would describe the techniques they use to adjust the learning rate and choose the stopping point.  But I am grateful to them for publishing at all.  smile [/EDITED]

My biggest mistake was using C++ instead of Java.  I at least doubled, and probably tripled, my development time, trying to figure out where the program had crashed (C++ doesn't tell you), trying to figure out what obscure compiler error messages meant, trying to resolve circular linking dependencies, and trying to work out the labryinthine rules governing all the many special cases in the language (e.g., how to initialize static variables in a template class; under what circumstances a template class needs to declare itself to be its own friend).

- Phil Goetz

Last edited by Phil (2009-11-02 10:35:54)

Offline

 

#2 2009-07-26 08:31:44

YehudaKoren
Member
Registered: 2007-09-23
Posts: 54

Re: My best and worst SVD tricks

Phil,

Thanks for posting these well thought improvements, for the benefit of all of us.
I was sorry to read on your unsuccessful experience SVD++.
The results I reported for it were obtained with almost no tweaking, and I know that others achieved even better results by tuning the meta parameters.
You can shoot me an email on this, if you are interested.

Regards,
Yehuda

Offline

 

#3 2009-07-26 09:07:15

Newman!
Member
From: BC, Canada
Registered: 2006-12-26
Posts: 168
Website

Re: My best and worst SVD tricks

Interesting. I've tried something similar to your #1 and #3 with no success, but SVD++ worked great.


When you control the mail, you control... information !

Offline

 

#4 2009-07-26 09:25:49

quadcore
Member
Registered: 2008-03-30
Posts: 40

Re: My best and worst SVD tricks

The change in the meaning of the ratings during the time period was noted in early posts, but not extensively discussed later. I noted that users who started rating movies before the "change", never really changed their use of the ratings. Users who started after the "change" used the new ratings and had slightly higher ratings, as reported. The "change date" is around 3/11/2004

I tried to make use of this, but in the end it didn't contribute much. I think the customer biases that are in most of the models handle the difference. It just so happens that the customers who have a higher bias are those who joined Netflix later.

This might have some effect on time effects. It also might distort some of the herding phenomenom that has been discussed. Since the number of users increased significantly over the time period, and newer users gave higher ratings, it would look like movies are getting better or that people are selecting movies better or that people are "following the crowd" more. It would be interesting for those who were studying herding effects to exclude the ratings for customers who joined early, and make sure the trends are maintained for later users.

Rough (but old) statistics:
67% of probe and qualifying ratings are for users who started rating after the "change". They only account for about 50% of the training set.

Richard Epstein (quadcore)

Offline

 

#5 2009-07-26 09:27:45

Aron
Member
Registered: 2006-10-02
Posts: 186

Re: My best and worst SVD tricks

Hey Phil (from LW\OB)

SVD++ works quite well. It's one of my favorite algorithms, and there are, as Yehuda mentions, a few ways to actually improve it's performance beyond the claims in their paper just by tweaking a few minor things. I can get ~.8938 probe with 256 features.

The way I learned how to train it was from the paper at the top here: http://www.netflixprize.com/community/v … hp?id=1043

Your experience with c++ is like my experience with CUDA. Yeah it was fun to learn and optimize but the difficulty debugging and developing did not make up for the rather insignificant value of being able to run the code fast. Better to work on subsets of data, or just experiment with fewer features (though conclusions there can occasionally be misleading).

Offline

 

#6 2009-07-26 10:48:14

dale5351
Member
From: Columbia, MD
Registered: 2008-10-18
Posts: 116

Re: My best and worst SVD tricks

Phil wrote:

Speaking of which, people should not publish descriptions of algorithms claiming low RMSEs, and fail to describe the techniques they use to adjust the learning rate and choose the stopping point!  It's just cruel!

I use a variable training rate that automatically adjusts.  Because of that, it does not matter much where I start it.  When I was using the full gradient method, I made adjustment by fitting a quadratic curve in the direction of the gradient and letting the next LR be the peak of that curve.  Now that I have switched to the incremental method of Simon Funk, I no longer have an over all gradient to use in that computation and so simply went to method of "if score improves, increase LR by factor of 1.1, but if score was worse decrease LR by factor of 0.2 and try again from the previous values."

My stopping rule is "stop after LR is less than 1E-8 and have had no improvement for 8 trial steps".

And thanks for the very informative post.  It gave me two ideas to try in my model -- not that there is any official time left.

Last edited by dale5351 (2009-07-26 10:52:20)

Offline

 

#7 2009-07-26 11:05:57

SteveC
Member
Registered: 2006-12-12
Posts: 4

Re: My best and worst SVD tricks

Best trick I could come up with is to use the weight sorted matrices as clusters combined with removing movies that are not well predicted.   Once You have trained some number of features, sort the user weights and break into batches and remove all movies that have a prediction rmse some amount over the training rmse -- 0.25 seemed to work.  I would then repeat the training with batches of the sorted users.  I used ~120k users per batch.  I was able to get an rmse of roughly 0.856 on 1mil of the probe set with only about 10 - 20 features. 

I was not able to get the overall rmse below .91 (typical value for Funk SVD), but the training time was much faster and the movies that should probably be predicted by other means stand out quickly and training on the others can be targeted better.

Sadly, this idea came to me a little too late so those last 400k ratings remained elusive and I never tried clustering with other SVD variants.

One variant that seemed promising was breakup the movies into batches and train the most positive and most negative together.

Finally, I did track the training error for the last three months of data versus all other ratings.  The final month was alway significantly better than the other periods.

lr/k = 0.003 for initial training and 0.005 for batches.  The initial training was for 25 features

Last edited by SteveC (2009-07-26 11:07:51)

Offline

 

#8 2009-07-26 14:28:16

Phil
Member
Registered: 2006-10-09
Posts: 132

Re: My best and worst SVD tricks

YehudaKoren wrote:

Phil,

Thanks for posting these well thought improvements, for the benefit of all of us.
I was sorry to read on your unsuccessful experience SVD++.
The results I reported for it were obtained with almost no tweaking, and I know that others achieved even better results by tuning the meta parameters.
You can shoot me an email on this, if you are interested.

Regards,
Yehuda

I'm sorry I sounded snippy.  I've barely slept since Wednesday, during which time a string of frustrating events, including burning out my motherboard, moving all my code to the Amazon cloud, spending two sleepless days trying to figure out why using 8 CPUs was slower than 1, and realizing that Amazon cloud computers would never, ever let a job run for a full day before killing it.  That last-minute rush happened because I kept delaying my SVD runs, because I was determined to get SVD++ working first.

Your publications have been great, and I really appreciate them.  Using regression to set the nearest-neighbor weights was especially clever.  It must be hard for you to have worked on this contest so long, and have the prize taken from you at the last moment by a .0001 difference.  I was rooting for you.  (After myself, of course.)

BTW, I thought that, if SVD++ worked by introducing a bias based on the movies a user watched, you ought to also be able to introduce a bias based on the movies a user didn't watch.  Could you just add another set of vectors in exactly the same way, except involving unwatched movies?

I also didn't understand why the movies used in SVD++ were preferentially neighbors of the movie being rated, rather than just popular movies.  That may have been my mistake.  I used popular movies, which may have biased them to have higher ratings, giving the sum of y_j a positive mean value, making my SVD++ unstable.

Offline

 

#9 2009-07-26 14:33:49

Phil
Member
Registered: 2006-10-09
Posts: 132

Re: My best and worst SVD tricks

Another thing that didn't work was singling out the "all-5" raters.  There are about, IIRC, 5,000 users who rate movies using almost nothing but 5s.  I tried pulling them out of the dataset and handling them separately, but whatever I tried with them made my RMSE higher.  That really puzzled me.

Offline

 

#10 2009-07-26 14:41:48

YehudaKoren
Member
Registered: 2007-09-23
Posts: 54

Re: My best and worst SVD tricks

Phil wrote:

Your publications have been great, and I really appreciate them.  Using regression to set the nearest-neighbor weights was especially clever.  It must be hard for you to have worked on this contest so long, and have the prize taken from you at the last moment by a .0001 difference.  I was rooting for you.  (After myself, of course.)

Thanks. In fact, this is a very happy day for us - our team is top contender for winning the Grand Prize, as we have a better Test score than The Ensemble. (Probably this is the first post revealing this in the forum smile)

Phil wrote:

BTW, I thought that, if SVD++ worked by introducing a bias based on the movies a user watched, you ought to also be able to introduce a bias based on the movies a user didn't watch.  Could you just add another set of vectors in exactly the same way, except involving unwatched movies?

Maybe - but won't it just duplicate the same signal in a comlementing way? Also, effcient implementaiton will get trickier (yet possible), as there are many more unrated movies than rated ones.

Phil wrote:

I also didn't understand why the movies used in SVD++ were preferentially neighbors of the movie being rated, rather than just popular movies.  That may have been my mistake.  I used popular movies, which may have biased them to have higher ratings, giving the sum of y_j a positive mean value, making my SVD++ unstable.

Not sure what you refer to here. We don't use "neighbors" or "popular" here. Just all rated movies.

Offline

 

#11 2009-07-26 14:48:04

Lazy Monkey
Member
Registered: 2007-12-13
Posts: 93

Re: My best and worst SVD tricks

Phil:  Thanks for your original post.

Phil wrote:

It must be hard for you to have worked on this contest so long, and have the prize taken from you at the last moment by a .0001 difference.  I was rooting for you.

I saw this and I was going to post that you should not count BPC out yet since we do not know the test RMSEs and while I was getting myself organized, Yehuda comes along and says that they have been told they have the lowest test RMSE.

It would be nice if the Prizemaster put up a notice to that effect.

Last edited by Lazy Monkey (2009-07-26 14:48:22)

Offline

 

#12 2009-07-26 14:56:51

Phil
Member
Registered: 2006-10-09
Posts: 132

Re: My best and worst SVD tricks

YehudaKoren wrote:

Phil wrote:

Your publications have been great, and I really appreciate them.  Using regression to set the nearest-neighbor weights was especially clever.  It must be hard for you to have worked on this contest so long, and have the prize taken from you at the last moment by a .0001 difference.  I was rooting for you.  (After myself, of course.)

Phil wrote:

BTW, I thought that, if SVD++ worked by introducing a bias based on the movies a user watched, you ought to also be able to introduce a bias based on the movies a user didn't watch.  Could you just add another set of vectors in exactly the same way, except involving unwatched movies?

Maybe - but won't it just duplicate the same signal in a comlementing way? Also, effcient implementaiton will get trickier (yet possible), as there are many more unrated movies than rated ones.

You would have to have 2 implicit weights per movie, one for "rated" and one for "didn't rate".

I was supposing you used an efficiency trick like the one Gravity and Chih-Chao Ma described, because I didn't think you could achieve the runtimes reported in that 2008 paper with the direct implementation.  Anyway, if it worked, that would be a case where you could probably use just the 500 most popular movies and get most of the benefits.

YehudaKoren wrote:

Phil wrote:

I also didn't understand why the movies used in SVD++ were preferentially neighbors of the movie being rated, rather than just popular movies.  That may have been my mistake.  I used popular movies, which may have biased them to have higher ratings, giving the sum of y_j a positive mean value, making my SVD++ unstable.

Not sure what you refer to here. We don't use "neighbors" or "popular" here. Just all rated movies.

Oh. All this time I was reading N(u) in equation 15 as N^k(i; u).  tongue

Last edited by Phil (2009-07-26 14:57:20)

Offline

 

#13 2009-07-26 14:59:16

Lazy Monkey
Member
Registered: 2007-12-13
Posts: 93

Re: My best and worst SVD tricks

YehudaKoren wrote:

Thanks. In fact, this is a very happy day for us - our team is top contender for winning the Grand Prize, as we have a better Test score than The Ensemble.

You must have had a rough 24 hours and a bad few minutes in there at the end before getting the notification you had the low test rmse.

Congratulations to you and the other members of your team.

Offline

 

#14 2009-07-26 14:59:57

Bold Raved Tithe
Member
Registered: 2006-11-17
Posts: 115

Re: My best and worst SVD tricks

YehudaKoren wrote:

Thanks. In fact, this is a very happy day for us - our team is top contender for winning the Grand Prize, as we have a better Test score than The Ensemble. (Probably this is the first post revealing this in the forum smile)

Congratulations !!!

That says a lot that you managed to take the crown with all your previous methods being published and available to your contenders while your contenders could maintain the veil of secrecy on their own (not all did, but most did and it was just fair game).

But did you take the crown ?
For that, not only do you need to be best on the test set but that you also must beat the 10% bar. I guess the answer is yes since you said it was a happy day for you wink

Offline

 

#15 2009-07-26 15:06:43

YehudaKoren
Member
Registered: 2007-09-23
Posts: 54

Re: My best and worst SVD tricks

Lazy Monkey wrote:

YehudaKoren wrote:

Thanks. In fact, this is a very happy day for us - our team is top contender for winning the Grand Prize, as we have a better Test score than The Ensemble.

You must have had a rough 24 hours and a bad few minutes in there at the end before getting the notification you had the low test rmse.

Congratulations to you and the other members of your team.

Yeah - we had a hell of a day, working like crazy... Then, for 90 minutes, we used to get living with losing, till that email cameā€¦ What a happy end...

Bold Raved Tithe  wrote:

That says a lot that you managed to take the crown with all your previous methods being published and available to your contenders while your contenders could maintain the veil of secrecy on their own (not all did, but most did and it was just fair game).

If I was on the other side, I would have probably acted exactly as The Ensemble did. It is just a fair game, and everyone is eligible to do the best to get his share. At the end, publishing worked well for us, but I still wonder what would I say if that end was slightly different smile

Offline

 

#16 2009-07-26 22:01:28

Phil
Member
Registered: 2006-10-09
Posts: 132

Re: My best and worst SVD tricks

One thing I was always curious about - Did AT+T let you work full-time on the Netflix prize?

Offline

 

#17 2009-07-26 23:39:27

LMV
Member
Registered: 2008-05-24
Posts: 46
Website

Re: My best and worst SVD tricks

Lazy Monkey wrote:

Congratulations to you and the other members of your team.

Ahem...
Not to spoil the party (too much) but:
1 - What is the point  of having "results" which are absolutely f**cking USELESS for all practical purposes?
2 - Is the second team (whomever it may be) really less deserving when the "winning edge" amounts to a .0001 difference in RMSE score?

Offline

 

#18 2009-07-27 00:13:46

Phil
Member
Registered: 2006-10-09
Posts: 132

Re: My best and worst SVD tricks

LMV wrote:

1 - What is the point  of having "results" which are absolutely f**cking USELESS for all practical purposes?

I'm surprised at how useful and practical the winning methods were.  I expected the winners to use a gigantic compute farm with thousands of cores.  The methods reported can all run on a single desktop in a day.

Offline

 

#19 2009-07-27 00:19:39

YehudaKoren
Member
Registered: 2007-09-23
Posts: 54

Re: My best and worst SVD tricks

Phil wrote:

One thing I was always curious about - Did AT+T let you work full-time on the Netflix prize?

Please notice that I left AT&T about a year ago, but much of my work on the contest was indeed while working at AT&T Research.
This kind of research shapes well with their specific needs and with their general agenda towards sceintific innovation.

(Same is true regarding my current employer - Yahoo!)

Offline

 

#20 2009-07-27 04:37:07

LMV
Member
Registered: 2008-05-24
Posts: 46
Website

Re: My best and worst SVD tricks

Phil wrote:

The methods reported can all run on a single desktop in a day.

Sure, one day for each model to blend... roll

Offline

 

#21 2009-07-27 12:06:00

Eoin Lawless
Member
Registered: 2009-07-27
Posts: 3

Re: My best and worst SVD tricks

SVD tricks - did anyone else try this?

My quick and easy way of adding date information to SVD was to add two further
sets of features, a pairing of movie and date of rating, and a pairing
of user and date of rating. The hope is that the first captures
seasonal effects such as first release, Christmas and Halloween, while the second
captures a users changing tastes over time. Date information has been
added to SVD, but not I believe, in as simple a manner. The
dates of the ratings are binned, so that ratings made in the first
month in the dataset are in bin one, ratings made in the second month
are in bin two etc. The expression for SVD prediction now has two extra terms:

p[u][m] = sum over i of  movie_feature[i][m] * user_feature[i][u]
+ sum over j of  moviedate_feature[j][m] * datemovie_feature[j][d]
+ sum over k of  userdate_feature[k][m] * dateuser_feature[k][d]

I found that I needed only about 10 features for the date
aspects, rather than about 200 for the movie/user features). The
learning rule for the new features is very similar, ie for the
moviedate feature:

error = r[u][m] - p[u][m][d]

moviedate_feature[k][m] = moviedate_feature[k][m] + learning_rate * error * datemovie_feature[k][d]

datemovie_feature[k][d] = datemovie_feature[k][d] + learning_rate * error * moviedate_feature[k][m]

p = prediction
r = rating
u = user
m= movie
d = date (month of rating)

Blending this and another variant of SVD I experimented with, I got a quiz RMSE of 0.9072

Congratulations to BPC and the Ensemble.

Last edited by Eoin Lawless (2009-07-27 12:09:27)

Offline

 

#22 2009-07-27 13:54:57

Phil
Member
Registered: 2006-10-09
Posts: 132

Re: My best and worst SVD tricks

What does moviedate_feature[j][m] * datemovie_feature[j][d] mean?  What's the difference between moviedate_feature and datemovie_feature?

Also, why did you decide to use more than one feature for this, and what was the binning for?  I don't understand what you did with your bins.

Offline

 

#23 2009-07-28 01:14:32

Eoin Lawless
Member
Registered: 2009-07-27
Posts: 3

Re: My best and worst SVD tricks

I only examined dates at a month level, rather than a day level, so all the ratings made in one month are considered to be in the same date 'bin'.

datemovie_feature[j][d] is the j'th feature for date d.
moviedate_feature[j][m] is the j'th feature for movie m

Their product is analogous to movie_feature[j][m] * user_feature[j][u] in Funkian SVD.
In Funkian SVD, the product represents user u's respone to feature j in movie m (ie
does user u like the high science fiction content of movie m), in this case I am trying to
represent how a movie's popularity changes over time - perhaps in response to
seasonal effects.

That's the intent anyway wink

I'm looking forward to try some of the ideas you listed - particularly simultaneous training and training the most rated users/movies first.

Offline

 

#24 2009-07-28 10:44:21

Phil
Member
Registered: 2006-10-09
Posts: 132

Re: My best and worst SVD tricks

I still don't understand.  For a user/movie pair, you have 1 movie date and 1 user date.  You talk about having 4 types of features: datemovie_feature, moviedate_feature, userdate_feature, and dateuser_feature; and having an array of all these 4 types.

Where do you load the dates into?

The indexing scheme you use, eg

moviedate_feature[k][m]

indicates that a movie has a separate feature value for every possible month.  Does that mean you set it to a 0 for the months the movie didn't come out, and a 1 for the month that it came out?

Offline

 

#25 2009-07-28 11:42:44

Eoin Lawless
Member
Registered: 2009-07-27
Posts: 3

Re: My best and worst SVD tricks

I have three sets of feature pairs: (user, movie), (date, movie) and (user, date). The first is normal Simon Funk SVD. I initialised each value to
sqrt( global_average/total_number_of_features) (plus some small randomness) and
trained them separately.

My prediction for user u rating movie m at date d is
sum over all k (user,movie) features:  user_feature[k][u] * movie_feature[k][m]
+ sum over all k (date, movie) features moviedate_feature[k][m] * datemovie_feature[k][d]
+ sum over all k (user, date) features userdate_feature[k][u] * dateuser_feature[k][d]

To train the (date, movie) set of features I iterate through the full set of ratings.
and let  moviedate_feature and datemovie_feature change so as to decrease the
error between my prediction for a rating and its real value. So if certain movies are
much more popular at certain times of the year, this should emerge. It may also
adjust for the jump in average rating in 2004 (though I only just became aware of
that reading Yehuda Koren's Temporal Dynamics paper today and haven't checked it).

The values of the features for dates before a movie came out don't matter as we're
not making predictions of ratings on dates prior to a movies release (except for
the possibly artificial ratings of Lord of the Rings that are present for before its release date).

Sorry if I'm not clear - I'll email or post the C code if that helps.

Offline

 

Board footer

Powered by PunBB
© Copyright 2002–2005 Rickard Andersson