Netflix Prize: Forum

Forum for discussion about the Netflix Prize and dataset.

You are not logged in.

Announcement

Congratulations to team "BellKor's Pragmatic Chaos" for being awarded the $1M Grand Prize on September 21, 2009. This Forum is now read-only.

#1 2008-06-03 07:12:00

sogrwig
Member
Registered: 2007-01-31
Posts: 189

SVD and initialization variables

I'm lately messing around with SVD (or MF or whatever it's called) and I've been reading past forum posts. I'm really impressed with it and how low it can go (rmse low!).

Forum members Clueless and Bored-Bitless among others have reported quiz scores below 0.90 and were kind enough to share their ideas.

You can find some excellent ideas in this post:

http://www.netflixprize.com/community/v … php?id=778

...where clueless sets goals and reports his progress.


I have been experimenting with a vanilla svd method (TD style) but have lately progressed to improving it with batch train of the features and many more modifications are on the way. The problem is that an extensive train requires many hours. But each run is TIGHTLY related to the regularization value, learning rate, number of features, min / max epochs, etc, and every different version of the SVD algo preforms differently with different starting variables.

I was wondering if I could save myself from running hundreds of tests. Is anyone willing to share a description of the best algorithm they have followed along with the init variables (LRATE, MIN/MAX epochs, K, etc) and how well it preforms?

I totally appreciate all the help this forum has provided. I'm grateful.

Offline

 

#2 2008-06-04 00:21:52

sogrwig
Member
Registered: 2007-01-31
Posts: 189

Re: SVD and initialization variables

I'll post my yesterday night experiment, which is nothing really special but I have just started struggling with svd.


A simple simon SVD (like TD's code) but with all features trained simultaneously:

Code:

#define TOTAL_FEATURES  50
#define MIN_IMPROVEMENT 0.0001     // Minimum improvement required to continue
#define INIT_SEED       0.1
#define INIT ((rand()/(float) (RAND_MAX))*INIT_SEED - (rand()/(float)(RAND_MAX))*INIT_SEED)    // Initialization value for features with random salt
#define LRATE           0.001      // Learning rate parameter
#define K               0.001      // Regularization parameter used to minimize over-fitting

In the 34th epoch it went as low as:

    <set x='34' y='0.732905' probe='0.939406' />

and then the probe rmse started climbing up.

If you want to share your best svd variation with the initialization variables, I would greatly appreciate it.

Offline

 

#3 2008-06-04 12:05:56

Gerald
Member
Registered: 2006-10-05
Posts: 43

Re: SVD and initialization variables

sogrwig,

I suggest initializing your features to sqrt(global_average / number_of_features) + small_random_value as described in this thread:

http://www.netflixprize.com/community/v … php?id=813

You should get much better results. smile

-Gerald

Last edited by Gerald (2008-06-04 12:06:54)

Offline

 

#4 2008-06-04 14:01:09

pipa
Member
Registered: 2008-05-30
Posts: 18

Re: SVD and initialization variables

Gerald wrote:

sogrwig,

I suggest initializing your features to sqrt(global_average / number_of_features) + small_random_value as described in this thread:

http://www.netflixprize.com/community/v … php?id=813

You should get much better results. smile

-Gerald

Nice thread, thanks. I too had made some experiments with the initializing of the features but the results were so confusing. Using the (sqrt (average / num_features) + salt) was giving me poorer results for a simple example.

Anyway I have run a few more tests now and I found that you are right. Like kirimaru says what seems to go better for simultaneous training of all features is this:

Code:

#define INIT_SEED       sqrtf(GLOBAL_AVERAGE/(float)TOTAL_FEATURES)   // Initialization value for features
#define INIT_VARIANCE   0.05          // variance range from the INIT_SEED value
#define INIT (INIT_SEED + (2.0*(rand()/(float)(RAND_MAX)) - 1.0)*INIT_VARIANCE) // meaning: INIT + rand[-INIT_VARIANCE, +INIT_VARIANCE]

Last edited by pipa (2008-06-04 14:08:03)

Offline

 

#5 2008-06-04 15:40:28

ch
Member
Registered: 2008-01-02
Posts: 11

Re: SVD and initialization variables

We initialize SVD features with very small values, e.g. uniform +/-0.001 and get RMSE around 0.909 with 100 dimensions on the probeset by training on movie means with simple stochastic gradient descent and early stopping.
Small learnrate leads to longer runs, but in general performs better on probe, regularization can be set to 0.01.

Team BigChaos

Offline

 

#6 2008-06-05 01:38:38

sogrwig
Member
Registered: 2007-01-31
Posts: 189

Re: SVD and initialization variables

Thanks to team BigChaos for the suggestions.


Yesterday's progress (Gradient descent, while updating all features simultaneously)
----------------------------
TOTAL_FEATURES 30
MIN_IMPROVEMENT 0.00001
INIT_SEED sqrt(G.AVERAGE / TOTAL_FEATURES) + rand[-0.005, +0.005]
LRATE 0.001
K 0.005  // regularization factor


This run gave after 58 epochs:

    <set x='58' y='0.755023' probe='0.915389' />

in about 2 hours.


I'm wondering what's the relation between the regularization factor and the number of features chosen. If I increase the number of features, would it be better if I also increased the regularization factor? Do smaller values for the factor always lead to better final rmses? Has anybody made any observation regarding their relation?

I'm now running it again with the same values except I have increased the features to 80. I will report the probe rmse when I get it. Any other suggestions? I plan to implement biases to see how much they improve the score.

Offline

 

#7 2008-06-05 10:08:59

sogrwig
Member
Registered: 2007-01-31
Posts: 189

Re: SVD and initialization variables

New run is SLIGHTLY better:

------------------------
TOTAL_FEATURES 80
MIN_IMPROVEMENT 0.00001
LRATE 0.001
K 0.005 // regularization factor
------------------------

     <set x='48' y='0.714864' probe='0.914543' /> time: 365.000000 sec

took 5 hours to compute.


I have a feeling that even without the biases it can go lower, although I'm not sure at all (must have seen it in a thread somewhere around here, but don't really remember).

Offline

 

#8 2008-06-08 01:22:30

sogrwig
Member
Registered: 2007-01-31
Posts: 189

Re: SVD and initialization variables

Simple simon (gradient descend) with all features trained simultaneously:

------------------------
TOTAL_FEATURES 100
LRATE 0.001
K 0.01
------------------------


     <set x='75' y='0.702074' probe='0.911148' />


Almost 9 hours to complete....

Offline

 

#9 2008-06-08 11:54:39

Gerald
Member
Registered: 2006-10-05
Posts: 43

Re: SVD and initialization variables

These are my simple simon (gradient descent) with all features trained simultaneously:

------------------------------------------------------------------
K = 0.02
LRATE = 0.001
FEATURES - 64
PROBE RMSE - 0.9096
TRAINING TIME - 15mins
INIT  = sqrt(global_avg / max_features) + small_random
------------------------------------------------------------------

------------------------------------------------------------------
K = 0.02
LRATE = 0.001
FEATURES - 128
PROBE RMSE - 0.9074
TRAINING TIME - 30mins
INIT  = sqrt(global_avg / max_features) + small_random
------------------------------------------------------------------

------------------------------------------------------------------
K = 0.02
LRATE = 0.001
FEATURES - 256
QUIZ RMSE - 0.8983
PROBE RMSE - 0.9063
TRAINING TIME - 1hr
INIT  = sqrt(global_avg / max_features) + small_random
------------------------------------------------------------------

------------------------------------------------------------------
K = 0.02
LRATE = 0.001
FEATURES - 1024
PROBE RMSE - 0.9061
TRAINING TIME - 4hrs
INIT  = sqrt(global_avg / max_features) + small_random
------------------------------------------------------------------

As far as I can tell, regularization has the biggest impact on RMSE (based on other results not shown above).  0.906 (probe) and 0.898 (quiz) seems to be about the most I can get out of Simon's SVD.  I suspect that more can be squeezed from it. Check out Simon's supplemental blog entry: http://sifter.org/~simon/journal/20070817.html.  I think he scored somewhere around 0.892 using this new technique based on a regularization value per observed rating that is inversely proportional to the number of ratings that were cast.  However, I have yet to successfully implement it.  If someone figures it out, I'd love to hear about it. smile


-Gerald

Last edited by Gerald (2008-07-12 08:50:15)

Offline

 

#10 2008-06-08 12:51:02

sogrwig
Member
Registered: 2007-01-31
Posts: 189

Re: SVD and initialization variables

Wow! thanks Gerald...

This is exactly what I have been looking for...
I just tried this:

-----------------------
TOTAL_FEATURES 120
LRATE 0.001
K 0.01
-----------------------

And I only got:

     <set x='74' y='0.693048' probe='0.911085' /> time: 584.000000 sec

100 Features gave 0.9111 and 120 features only 0.0001 improvement... So I have to give it a go now with your init... Thx!

Did you really try 1024 features ? !!!

Last edited by sogrwig (2008-06-08 12:52:16)

Offline

 

#11 2008-06-08 13:19:08

quadcore
Member
Registered: 2008-03-30
Posts: 40

Re: SVD and initialization variables

Gerald,

Nice results.

Are you starting from the raw data, from double-centered data, or from the Belkor globals (or something else)? If you are using globals, what is your RMS before you run the SVD?

-quadcore

Offline

 

#12 2008-06-08 13:23:56

Gerald
Member
Registered: 2006-10-05
Posts: 43

Re: SVD and initialization variables

sogrwig,

I think you'll also get better results if you set K=0.02 when training more than about 10 features.  Also, when K=0.02, you have to train for about 155 epochs instead of ~75.

Yes, I trained 1024 features a while ago.  As you can see, it didn't help much.  Again, I think the key is regularization. 

Let us know how your results turn out.

Offline

 

#13 2008-06-08 13:37:14

Gerald
Member
Registered: 2006-10-05
Posts: 43

Re: SVD and initialization variables

quadcore wrote:

Gerald,

Nice results.

Are you starting from the raw data, from double-centered data, or from the Belkor globals (or something else)? If you are using globals, what is your RMS before you run the SVD?

-quadcore

I'm starting from raw data.  RMSE prior to running SVD varies depending on how many features are being used. 

1 feature = 1.12
10 features = 1.14
50 features = 1.26
100 features = 1.40
255 features = 1.74

Offline

 

#14 2008-06-08 13:45:30

sogrwig
Member
Registered: 2007-01-31
Posts: 189

Re: SVD and initialization variables

Gerald wrote:

sogrwig,
Also, when K=0.02, you have to train for about 155 epochs instead of ~75.

Well... I'm not really using the epochs at simultaneous training of all the features.

I constantly monitor the probe rmse while training, without even using an rmse limit like MIN_IMPROOVENT. I just use:

  while (probe_rmse_current <= probe_rmse_last) ...


I run a few test with 10 features and noticed that when the probe rmse starts climbing up... it never goes down again (in simultaneous feature training that is). So I save the number of epochs when then probe rmse stops falling and use this number on the qualifying training as well.

Offline

 

#15 2008-06-09 09:47:44

sogrwig
Member
Registered: 2007-01-31
Posts: 189

Re: SVD and initialization variables

Gerald wrote:

------------------------------------------------------------------
K = 0.02
LRATE = 0.001
FEATURES - 64
PROBE RMSE - 0.9096
TRAINING TIME - 2hrs
INIT  = sqrt(global_avg / max_features) + small_random
------------------------------------------------------------------

Superb! Almost perfect match. This is what I got with these values:

     <set x='144' y='0.726203' probe='0.909726' />


The thing is that it run for 11 hours... It runs on 3.2Ghz core, I have not
implemented any kind of multi-threading, ...and NO caching sad

I'm not sure how to implement caching on simultaneous features training. How did you make it run in 2 hours?

Offline

 

#16 2008-06-09 11:00:38

Clueless
Member
From: Maryland
Registered: 2007-10-09
Posts: 128
Website

Re: SVD and initialization variables

A few comments:

1) The Funk-style SVD, whether training features simultaneously or not, is sensitive to the order of the training data.  Adding biases seems to help with this (a little), as does changing the minimization method (I'm using a slightly different gradient method).  While this is clearly an algorithmic weakness, it can be used to your advantage.  It allows you to get a small RMSE improvement by "bagging" - training more than one SVD with the data in a different order for each, and combining the results.  This is more difficult than it sounds, because changing the data order also changes the ideal learning and regularization rates.  Also, just making random changes to the data order doesn't seem to help as much as ordering it based on some pseudo-logical criteria (i.e. by movie and/or customer variance, or by the amount of "support" for a given sample, or by rating date, or whatever).

2) So far I haven't found any data pre-processing that helps the final SVD RMSE.  As stated in another thread, double-centering, or removal of "global effects" is roughly the same as pre-computing features.  In the long run, doing this appears to hurt RMSE rather than help it, but for low-rank SVDs it can help.  For example: a 10-feature SVD on double-centered data seems to be roughly equivalent to a 12-feature SVD on raw data - but by feature 30 or so the double-centered version is worse than the raw data version. 

3) If you want to do an interesting experiment, try this...

  a) Run a 10-feature SVD (with probe data removed) on raw data using whatever methods/parameters you've been  using.  Most people probably have one of these sitting around somewhere.  Score this vs. the probe set (I got 0.9204).

  b) Now convert the raw data to Z scores as follows

new_score = (original_score - 3.603307)/1.084572

to save time I set up a conversion matrix like this...

1 = -1.900309
2 = -1.478286
3 = -0.556263
4 = 0.365760
5 = 1.287783

run another 10-feature SVD (using the same parameters that you used in step (a)) on the Z scores.  Score this vs. the probe set (to generate the final score, use: 1.084572*dot_product+3.603307, where dot_product is the usual SVD "prediction").  Do you get a lower probe score?  I did (0.9166).  Cool huh?

  c) Repeat (a) and (b) with 30 features.  Now which one is better?  Hmmm...

I tried this early in my experimentation, using a more-or-less vanilla funk/gradient method, and it's always possible that there was a flaw in my code somewhere, but the result seems to be repeatable.

Offline

 

#17 2008-06-09 13:48:49

sogrwig
Member
Registered: 2007-01-31
Posts: 189

Re: SVD and initialization variables

Clueless wrote:

c) Repeat (a) and (b) with 30 features.  Now which one is better?  Hmmm...

Sorry I didn't quite get that. Do you mean that all this improvement vanished when you increased the features ?

Offline

 

#18 2008-06-09 13:53:04

Gerald
Member
Registered: 2006-10-05
Posts: 43

Re: SVD and initialization variables

sogrwig wrote:

Gerald wrote:

------------------------------------------------------------------
K = 0.02
LRATE = 0.001
FEATURES - 64
PROBE RMSE - 0.9096
TRAINING TIME - 2hrs
INIT  = sqrt(global_avg / max_features) + small_random
------------------------------------------------------------------

Superb! Almost perfect match. This is what I got with these values:

     <set x='144' y='0.726203' probe='0.909726' />


The thing is that it run for 11 hours... It runs on 3.2Ghz core, I have not
implemented any kind of multi-threading, ...and NO caching sad

I'm not sure how to implement caching on simultaneous features training. How did you make it run in 2 hours?

I use a quad core 3.3Ghz CPU with multi-threaded code to utilize all 4 cores and using SSE for floating point calculations.  Also, all data is ordered and aligned for maximum performance.  I don't do any caching either.

Last edited by Gerald (2008-06-09 13:57:57)

Offline

 

#19 2008-06-09 19:07:31

Clueless
Member
From: Maryland
Registered: 2007-10-09
Posts: 128
Website

Re: SVD and initialization variables

sogrwig wrote:

Clueless wrote:

c) Repeat (a) and (b) with 30 features.  Now which one is better?  Hmmm...

Sorry I didn't quite get that. Do you mean that all this improvement vanished when you increased the features ?

Yep.  And by 40 features the raw data outperforms the "cooked" data.  My supposition is that some of the information is actually removed by converting to Z scores (even though all it really does is shrink the values towards the global mean) and that this causes the SVD to under-perform.  Further experiments seems to show that this is true for double-centering as well, and mostly true for removal of "global effects", although, depending on the effect, this isn't as clear cut.

Offline

 

#20 2008-06-10 11:02:32

sogrwig
Member
Registered: 2007-01-31
Posts: 189

Re: SVD and initialization variables

Progress update and some questions.

I have not made another run with more features yet, but I did modify the code to use customer and movie biases. The first run gives:

------------------------
INIT  = sqrt(global_avg / max_features) + small_random
TOTAL_FEATURES  40
LRATE                   0.001
K                          0.02          // training regularization
LAMDA                 0.05          // Biases regularization
------------------------

     <set x='147' y='0.744870' probe='0.911658' />

(biases are initialized as 0.0 and are regularized with LAMDA, not K)

Has anybody any suggestions about regularization values and learning rate?

Last edited by sogrwig (2008-06-10 11:04:45)

Offline

 

#21 2008-06-10 12:51:56

Clueless
Member
From: Maryland
Registered: 2007-10-09
Posts: 128
Website

Re: SVD and initialization variables

sogrwig wrote:

Progress update and some questions.

I have not made another run with more features yet, but I did modify the code to use customer and movie biases. The first run gives:

------------------------
INIT  = sqrt(global_avg / max_features) + small_random
TOTAL_FEATURES  40
LRATE                   0.001
K                          0.02          // training regularization
LAMDA                 0.05          // Biases regularization
------------------------

     <set x='147' y='0.744870' probe='0.911658' />

(biases are initialized as 0.0 and are regularized with LAMDA, not K)

Has anybody any suggestions about regularization values and learning rate?

This is heavily dependent on your implementation.  Back when I was using a more standard (i.e. funk-style) gradient method I used a "learning rate" of 0.0009, and a regularization factor of 0.022 for the biases.  My best 40-feature RMSE (scored vs. probe) using that method was 0.9102.  My current method gets me down to 0.9089 with 40 features.

By the way, instead of using "sqrt(global_avg/max_features) + small_random" for initial values I pull from a gaussian distribution centered around
sqrt(global_avg/max_features) with a very small standard deviation.  This blog entry (http://www.logarithmic.net/pfh/blog/01176798503) offers a different method.

Last edited by Clueless (2008-06-10 12:59:57)

Offline

 

#22 2008-06-10 14:50:22

sogrwig
Member
Registered: 2007-01-31
Posts: 189

Re: SVD and initialization variables

Clueless wrote:

This is heavily dependent on your implementation.

Thanks Clueless. My implementation is this one:

Code:

For i=1 to number_of_iterations
  For j=1 to number_of_ratings

    Calculate error in current prediction
    Update biases

    For k=1 to number_of_features
      Update movie-user vectors (exactly simon's equations)
    End For
  End For
End For

And the code for the biases is

Code:

    c_bias[custId] += (LRATE * (err - LAMDA * c_bias ));
    m_bias[movieId] += (LRATE * (err - LAMDA * m_bias ));

One other question is... how low can this method go with a "resonable" number of features? If you don't mind sharing your init variables. Thanks.

Last edited by sogrwig (2008-09-15 10:04:29)

Offline

 

#23 2008-06-11 09:42:57

Clueless
Member
From: Maryland
Registered: 2007-10-09
Posts: 128
Website

Re: SVD and initialization variables

I want to stress that these numbers are based on old experiments, and that my current implementation is significantly different.  Nevertheless, using the following method...

Code:

for (i=0; i<N_ITERATIONS; i++) {
    for (r=0; r<N_RATINGS; r++) {
        for (f=0; f<N_FEATURES; f++) {
            update feature vectors;
            if (biases are still changing sufficiently) {
                update biases;
            }
        }
    }
}

I got good results with the numbers I previously posted (bias learning rate = 0.0009, bias regularization = 0.022).  Note that the main difference between this implementation and yours is that I updated the biases inside the ratings loop, and stopped updating them once the values stopped changing.  This was computationally more expensive in the short run, but seemed to give slightly better results than a single bias update either before or after the feature vector updates.

One other question is... how low can this method go with a "resonable" number of features?

As I said, my current implementation is significantly different, but using that method I did run an 80-feature SVD (bias learning rate=0.0009, bias regularization=0.03) which resulted in a probe RMSE of 0.9059.  I never submitted any of the results of this method to Netflix for scoring, nor did I go beyond 80 features (I didn't have enough RAM at the time), so I can't really answer that question.

Last edited by Clueless (2008-06-11 09:43:20)

Offline

 

#24 2008-06-11 12:10:17

sogrwig
Member
Registered: 2007-01-31
Posts: 189

Re: SVD and initialization variables

Thanks Clueless. This is what I got with all features trained simultaneously and biases:


TOTAL_FEATURES  80
LRATE           0.001         // Learning rate parameter
K                   0.02          // Regularization parameter used to minimize over-fitting
LAMDA           0.05          // Biases regularization

(my BIASES learning rate was the same as the features LRATE)


     <set x='143' y='0.717586' probe='0.909460' />


I will run it with your init values and no other changes to see how close we are.

Last edited by sogrwig (2008-06-11 12:11:38)

Offline

 

#25 2008-06-11 13:05:25

Clueless
Member
From: Maryland
Registered: 2007-10-09
Posts: 128
Website

Re: SVD and initialization variables

sogrwig wrote:

I will run it with your init values and no other changes to see how close we are.

According to my notes I used the following initial values for my 80-feature run...

movie biases: 0.57
customer biases: 0.03
movie features: 0.16
customer features: 0.19

I didn't add any random perturbation, Gaussian or otherwise.  The number of iterations was 155.  The final training RMSE was 0.7167, and the probe RMSE was 0.9059.

Let me know how yours comes out.

Offline

 

Board footer

Powered by PunBB
© Copyright 2002–2005 Rickard Andersson