Netflix Prize: Forum

Forum for discussion about the Netflix Prize and dataset.

You are not logged in.

Announcement

Congratulations to team "BellKor's Pragmatic Chaos" for being awarded the $1M Grand Prize on September 21, 2009. This Forum is now read-only.

#1 2007-01-27 18:53:52

winsteps
Member
From: Chicago
Registered: 2006-10-03
Posts: 269
Website

How to tweak your way up the Quiz leaderboard

You have made a submission for a set of ratings {r} and obtained an RMSE of R on the Quiz dataset. Then you may be able to improve your RMSE easily:

N = 2817131 qualifying ratings
QMEAN = 3.674405 is my best estimate of the mean of the Quiz dataset
RMSE of the Quiz dataset around its mean, QRMSE = 1.128735
QVAR = QRMSE^2 = 1.274043

For your submission:
rMEAN = sum(r)/N
rVAR = sum(r^2)/N - rMEAN^2

Adjusting the mean of your submission to match the mean of the Quiz set:
A = QMEAN - rMEAN
so that, an improved submission is {r'} where
r' = r + A
then
rMEAN' = QMEAN
rVAR' = rVAR - A^2
Then your improved R is predicted to be R' = square-root ( R^2 - A^2 )

By adjusting the variance of your submission, R' can be improved more:
r'' = k*(r'-QMEAN) + QMEAN
where
k = (QVAR + rVAR' - R'^2) / (2*rVAR')
rMEAN'' = QMEAN
rVAR'' = k^2 * rVAR'
Then your improved R'' = square-root ( k*R'^2 + QVAR*(1-k) - k*(1-k)*rVAR')

Of course, clamp any submission 1<=r<=5.

Though this will improve your position on the Quiz leaderboard, it is probably not an acceptable algorithm, .... :-)
(Corrections invited.)

Offline

 

#2 2007-01-27 21:40:27

vdicarlo
Member
From: Davis, California
Registered: 2006-10-03
Posts: 78

Re: How to tweak your way up the Quiz leaderboard

Why is this not acceptable?

We are doing something similar, and it is not a huge improvement, but it helps some.

Offline

 

#3 2007-01-27 23:59:28

winsteps
Member
From: Chicago
Registered: 2006-10-03
Posts: 269
Website

Re: How to tweak your way up the Quiz leaderboard

I assumed it was not acceptable because it is computed based on part of the answer. But, on rereading the rules, it does appear that results based on optimizing to the Quiz subset are acceptable.

Of course, we don't know how these shortcuts affect Test RMSEs. The closer the Test subset's mean and s.d. are to the Quiz subset's, the better this method would work overall.

Offline

 

#4 2007-01-28 08:33:26

glass1man
Member
Registered: 2006-12-25
Posts: 20

Re: How to tweak your way up the Quiz leaderboard

Was QMean retreived by guessing the same rating for all entries, and then adjusting that guess until you get an optimal RMSE?

I'm not trying to blow a hole in your approach, i'm just asking.

Offline

 

#5 2007-01-28 09:14:52

zc
Member
Registered: 2006-12-31
Posts: 9

Re: How to tweak your way up the Quiz leaderboard

http://www.netflixprize.com/community/v … php?id=435 shows all 1's RMSE (let it be RMSE1) and all 2's RMSE (RMSE2), which are fairly easy to obtain.

QMean = (RMSE1^2 - RMSE2^2 + 3)/2, which can be deduced from RMSE's formula.

Offline

 

#6 2007-01-28 09:21:06

zc
Member
Registered: 2006-12-31
Posts: 9

Re: How to tweak your way up the Quiz leaderboard

Prizemaster should probably clarify if the use of RMSE from quiz dataset is acceptable or not. Since the final award is based on the test dataset, not quiz dataset, and in real world scenario you can reserve certain amount of known rating data points for the equivalent of quiz dataset here, the algorithm is still valid. But we run the risk of making the test RMSE worse if the stats on quiz subset are significantly different from test subset. The same question applies to earlier suggestions on using quiz RMSEs to figure out the weights to combine two algorithsm.

Offline

 

#7 2007-01-28 11:24:06

winsteps
Member
From: Chicago
Registered: 2006-10-03
Posts: 269
Website

Re: How to tweak your way up the Quiz leaderboard

Glass1man asks : "Was QMEAN retrieved by guessing ..."
Zc is correct that it was obtained by computation. If you look down the list of RMSEs for all 1's, 2's, 3's, 4's and 5's. You will discover they yield a range of possible QMEANs and QRMSEs (due to RMSEs being reported to 4 decimal places.). Linear algebra reveals that the true QMEAN must be in a narrow range. My best estimate is the center of the range.

Offline

 

#8 2007-02-01 01:25:36

glass1man
Member
Registered: 2006-12-25
Posts: 20

Re: How to tweak your way up the Quiz leaderboard

maybe I did this wrong, but I got a really bad result.

I started with an already scored submission:
score: 0.9190
Mean:3.6451
Variance:0.4636

I then tweaked it until it matched the provided mean:
Mean:3.674405
Variance:1.214795 (close to 1.274043)
score:<b>1.0529</b>

Is this a bug on my part, or did i just totally miss the point of this exercise?

in case you want it, here's the java code I used:
    public double getVariance(ArrayList<CustomerMovieGuess> guesses) {
        double average = getAverage(guesses);

        double sum = 0;
        double count = 0;
        for (CustomerMovieGuess guess : guesses) {
            sum += guess.rating * guess.rating;
            count++;
        }

        return (sum / count) - (average * average);
    }

    public double getAverage(ArrayList<CustomerMovieGuess> guesses) {
        double sum = 0;
        double count = 0;
        for (CustomerMovieGuess guess : guesses) {
            sum += guess.rating;
            count++;
        }
        return sum / count;
    }

    public void adjustAverage(ArrayList<CustomerMovieGuess> guesses) {
        double average = getAverage(guesses);

        double offset = QAVG / average;

        for (CustomerMovieGuess guess : guesses) {
            guess.rating *= offset;

            if (guess.rating > 5.0) {
                guess.rating = 5.0;
            }
        }
    }

    public void adjustVariance(ArrayList<CustomerMovieGuess> guesses) {
        double average = getAverage(guesses);
        double var = getVariance(guesses);

        double offset = Math.sqrt(QVAR / var);

        for (CustomerMovieGuess guess : guesses) {
            double diff = guess.rating - average;
            double adjustment = (diff * offset);

            adjustment = Math.min(adjustment, 2.0);
            adjustment = Math.max(adjustment, -2.0);

            guess.rating = average + adjustment;
        }
    }

Offline

 

#9 2007-02-01 04:49:48

winsteps
Member
From: Chicago
Registered: 2006-10-03
Posts: 269
Website

Re: How to tweak your way up the Quiz leaderboard

Glass1man, you got the mean part, but not the variance part. The desired variance is not 1.214795, but
K^2*rVAR'
where
k = (QVAR + rVAR' - R'^2) / (2*rVAR')
so, in your example:
QVAR= 1.274043
rVAR' = 0.4636 - (3.6451 - 3.674405)^2 = 0.4546
R'^2 = 0.9190^2 = .8446 - .009 = 0.8356 (RMSE=0.9141)
so k = .9822
so your new variance is
rVAR'' = k^2*rVAR' = 0.4386 <= this is the variance you want for your predictions
and the predicted RMSE^2 = ( k*R'^2 + QVAR*(1-k) - k*(1-k)*rVAR')
= 0.8355
so predicted RMSE = 0.9140
Not much of a gain, but your RMSE was good to start with!

Last edited by winsteps (2007-02-02 22:57:48)

Offline

 

#10 2007-02-01 15:02:59

ProjectComputing
Member
Registered: 2007-02-01
Posts: 10

Re: How to tweak your way up the Quiz leaderboard

Winsteps is correct and his method works just great as "and end of pipe" treatment if you want to slightly increase your algorithm's scoring power rather than its predictive power.

However it looks like there could be another problem in the application of the theory when using a multiplicative method (as in the java code supplied by glass1man) to adjust the mean rather than the usual additive method. Clipping (usually to 5.0) will happen more frequently. Also the multiplicative mean adjustment will affect the variance, so should never be applied alone, but always before the variance adjustment as detailed by winsteps.

Offline

 

#11 2007-02-02 22:35:44

glass1man
Member
Registered: 2006-12-25
Posts: 20

Re: How to tweak your way up the Quiz leaderboard

ProjectComputing wrote:

Winsteps is correct and his method works just great as "and end of pipe" treatment if you want to slightly increase your algorithm's scoring power rather than its predictive power.

However it looks like there could be another problem in the application of the theory when using a multiplicative method (as in the java code supplied by glass1man) to adjust the mean rather than the usual additive method. Clipping (usually to 5.0) will happen more frequently. Also the multiplicative mean adjustment will affect the variance, so should never be applied alone, but always before the variance adjustment as detailed by winsteps.

you are correct, I couldn't figure out how to do both the mean and to clip at the same time, so I just ran it through 5 iterations or so and it converged.

Offline

 

#12 2007-02-18 12:38:43

Lonnie
Member
Registered: 2006-10-19
Posts: 40

Re: How to tweak your way up the Quiz leaderboard

Has anyone implemented the Winsteps post-processing in C/C++?

I would really like to see the code working if you are willing to share it.

I just can't seem to get it implemented correctly yet.

Cheers,
Lonnie

Offline

 

#13 2007-02-18 17:17:40

skomoroch
Member
From: Washington DC
Registered: 2007-01-09
Posts: 34
Website

Re: How to tweak your way up the Quiz leaderboard

Overtraining using this technique is going to waste a lot of coding time for you guys, I'd direct effort elsewhere.  Even if you were to get close to the grand prize using a particular algorithm, tweaking your predictions with this is more likely to hurt than help on the real evaluation data Netflix holds in reserve.

Offline

 

#14 2007-02-19 12:16:55

winsteps
Member
From: Chicago
Registered: 2006-10-03
Posts: 269
Website

Re: How to tweak your way up the Quiz leaderboard

Skomoroch, you are correct. But most of us don't have a hope of winning a prize - but we certainly relish the bragging-rights of appearing on the Leaderboard ....
And these tweaks require only a few lines of code and little run-time.

Offline

 

#15 2007-02-19 13:08:19

skomoroch
Member
From: Washington DC
Registered: 2007-01-09
Posts: 34
Website

Re: How to tweak your way up the Quiz leaderboard

Don't count yourselves out, I've seen a lot of good ideas from you guys on the forums.  This is a great learning experience, I hope more companies follow the example of Netflix.

Offline

 

#16 2007-02-19 22:28:21

glass1man
Member
Registered: 2006-12-25
Posts: 20

Re: How to tweak your way up the Quiz leaderboard

Lonnie wrote:

Has anyone implemented the Winsteps post-processing in C/C++?

I would really like to see the code working if you are willing to share it.

I just can't seem to get it implemented correctly yet.

Cheers,
Lonnie

I didn't do it with C/C++ but here's my code for java:
(sloppy cut and paste, since it's not all in one file)

public class CustomerMovieGuess {

    public int customer;

    public int movie;

    public double rating;
}

    private static final double QAVG = 3.674405;

    private static final double QVAR = 0.4386;

    public void adjustAverage(ArrayList<CustomerMovieGuess> guesses) {
        double average = getAverage(guesses);

        double offset = QAVG / average;

        for (CustomerMovieGuess guess : guesses) {
            guess.rating *= offset;

            if (guess.rating > 5.0) {
                guess.rating = 5.0;
            }
        }
    }

    public void adjustVariance(ArrayList<CustomerMovieGuess> guesses) {
        double average = getAverage(guesses);
        double var = getVariance(guesses);

        double offset = Math.sqrt(QVAR / var);

        for (CustomerMovieGuess guess : guesses) {
            double diff = guess.rating - average;
            double adjustment = (diff * offset);

            adjustment = Math.min(adjustment, 2.0);
            adjustment = Math.max(adjustment, -2.0);

            guess.rating = average + adjustment;
        }
    }

        public void run() {
        adjustAverage(guesses);
        adjustVariance(guesses);
        adjustAverage(guesses);
        adjustVariance(guesses);
        adjustAverage(guesses);
        adjustVariance(guesses);
        adjustAverage(guesses);
        }

Offline

 

#17 2007-03-18 10:12:09

tron
Member
Registered: 2006-11-10
Posts: 7

Re: How to tweak your way up the Quiz leaderboard

Thanks for all the ideas and input!!

vb.net implementation thereof.  The Mean seems to be working well, but the variance actually made my score worse.  If anyone has suggestions on improvement, please add.

Had a 0.9130 on original, but the variance portion skewed it up to .935.

-----------------------------------------


Imports System.Text
Imports System.IO
Imports System.Math

Module Module1

    Sub Main()
        LoadSubmission()
        adjustAverage()
        adjustVariance()
        ProcessTest()

        Console.WriteLine("DONE")
        Console.ReadLine()
    End Sub

    Private N As Integer = 2817130 'qualifying ratings
    Private QRSME As Double = 0.913 'rsme scrore from netflix on qualifyier
    Private QMEAN As Double = 3.674405 'global average on dataset
    Private QVAR As Double = QRSME * QRSME
    Private TRAINING_PATH As String = "C:\netflix\prevPredictions\"
    Private TEST_PATH As String = "C:\netflix\"
    Private FILE_NAME As String = "prediction.pseudo_0.9130_50.txt"


    Public Structure Data
        Dim MovieId As Short
        Dim Rating As Double
    End Structure
    Public guess(N) As Data


    Public Function getAverage() As Double
        Dim sum As Double = 0
        Dim count As Double = 0
        For i As Double = 0 To N
            sum += guess(i).Rating
            count += 1
        Next
        Return sum / count
    End Function

    Public Function getVariance() As Double
        Dim average As Double = getAverage()
        Dim sum As Double = 0
        Dim count As Double = 0
        For i As Integer = 0 To N
            sum += guess(i).Rating * guess(i).Rating
            count += 1
        Next
        Return (sum / count) - (average * average)

    End Function


    Public Sub adjustAverage()
        Dim rMEAN As Double = getAverage()
        Dim offset As Double = QMEAN / rMEAN
        For i As Double = 0 To N
            guess(i).Rating *= offset
        Next
    End Sub


    Public Sub adjustVariance()
        Dim rMEAN As Double = getAverage()
        Dim var As Double = getVariance()
        Dim offset As Double = Math.Sqrt(QVAR / var)

        For i As Integer = 0 To N
            Dim diff As Double = guess(i).Rating - rMEAN
            Dim adjustment As Double = (diff * offset)
            adjustment = Math.Min(adjustment, 2.0)
            adjustment = Math.Max(adjustment, -2.0)
            guess(i).Rating = rMEAN + adjustment
        Next

    End Sub


    Sub LoadSubmission()
        Dim movieID As Short = -1
        Dim m_nRatingCount As Integer = 0
        Dim s As StreamReader = New StreamReader(TRAINING_PATH + FILE_NAME)
        Do While s.Peek() >= 0
            Dim strTemp As String = s.ReadLine()
            If Not strTemp.IndexOf(":").Equals(-1) Then
                movieID = strTemp.Substring(0, strTemp.Length - 1)
            Else
                guess(m_nRatingCount).MovieId = movieID
                guess(m_nRatingCount).Rating = strTemp
                m_nRatingCount += 1
            End If
        Loop
        s.Close()
    End Sub


    Sub ProcessTest()
        'remove file
        Dim FilePath As String = TEST_PATH + "predictionAjustment.txt"
        Dim fileExists As FileInfo = New FileInfo(FilePath)
        fileExists.Delete()

        'perform export
        Dim movieId As Integer
        Dim output As New StreamWriter(FilePath, False, UnicodeEncoding.Default)

        Console.WriteLine()
        Console.WriteLine("Processing test: ")

        Dim _Guess As Double
   
        For i As Integer = 0 To N
            If Not movieId.Equals(guess(i).MovieId) Then
                movieId = guess(i).MovieId
                output.WriteLine(movieId.ToString + ":")
                _Guess = Round(guess(i).Rating, 2)
                If _Guess > 5 Then _Guess = 5
                If _Guess < 1 Then _Guess = 1
                output.WriteLine(_Guess)
            Else
                _Guess = Round(guess(i).Rating, 2)
                If _Guess > 5 Then _Guess = 5
                If _Guess < 1 Then _Guess = 1
                output.WriteLine(_Guess)
            End If
        Next
        output.Close()
    End Sub

End Module

Offline

 

#18 2007-03-18 14:15:16

winsteps
Member
From: Chicago
Registered: 2006-10-03
Posts: 269
Website

Re: How to tweak your way up the Quiz leaderboard

Tron, please go back and read my posts about the variance adjustment. The purpose is not to match the quiz variance. It is to optimize the local variance. The important lines are:
k = (QVAR + rVAR' - R'^2) / (2*rVAR')
r' = k*(r-rMEAN) + QMEAN

Offline

 

#19 2007-03-19 08:11:34

tron
Member
Registered: 2006-11-10
Posts: 7

Re: How to tweak your way up the Quiz leaderboard

Thanks winsteps!  Your explanation made it all click.  The result set went from a .9130 to .9128.  Every little bit helps.  (Had my Q's and R's  mixed up...)
Here is final implementation:

-------------------


Imports System.Text
Imports System.IO
Imports System.Math

Module Module1

    Sub Main()
        LoadSubmission()
        adjustAverage()
        adjustVariance()
        ProcessTest()
        Console.WriteLine("DONE")
        Console.ReadLine()
    End Sub

    'thanks winsteps http://www.netflixprize.com/community/v … php?id=503
    Private N As Integer = 2817130 'qualifying ratings
    Private QMEAN As Double = 3.674405 'best estimate of the mean of the Quiz dataset
    Private QRMSE As Double = 1.128735 'RMSE of the Quiz dataset around its mean
    Private QVAR As Double = QRMSE * QRMSE
    Private R As Double = 0.913 '<--RSME of YOUR score on the qualifyier from netflix
    Private TRAINING_PATH As String = "C:\netflix\prevPredictions\"
    Private TEST_PATH As String = "C:\netflix\"
    Private FILE_NAME As String = "prediction.pseudo_0.9130_50.txt"

    Public Structure Data
        Dim MovieId As Short
        Dim Rating As Double
    End Structure
    Public guess(N) As Data

    Public Function getAverage() As Double
        Dim sum As Double = 0
        Dim count As Double = 0
        For i As Double = 0 To N
            sum += guess(i).Rating
            count += 1
        Next
        Return sum / count
    End Function

    Public Sub adjustAverage()
        Dim rMEAN As Double = getAverage()
        Dim offset As Double = QMEAN / rMEAN
        For i As Double = 0 To N
            guess(i).Rating *= offset
        Next
    End Sub

    Public Function getVariance() As Double
        Dim rMEAN As Double = getAverage()
        Dim sum As Double = 0
        Dim count As Double = 0
        For i As Integer = 0 To N
            sum += guess(i).Rating * guess(i).Rating
            count += 1
        Next
        Return (sum / count) - (rMEAN * rMEAN)
    End Function

    Public Sub adjustVariance()
        Dim rMEAN As Double = getAverage()
        Dim rVAR As Double = getVariance()
        Dim k As Double = (QVAR + rVAR - (R * R)) / (2 * rVAR)
        For i As Integer = 0 To N
            Dim _r As Double = guess(i).Rating
            guess(i).Rating = k * (_r - rMEAN) + QMEAN
        Next
    End Sub

    Sub LoadSubmission()
        Dim movieID As Short = -1
        Dim m_nRatingCount As Integer = 0
        Dim s As StreamReader = New StreamReader(TRAINING_PATH + FILE_NAME)
        Do While s.Peek() >= 0
            Dim strTemp As String = s.ReadLine()
            If Not strTemp.IndexOf(":").Equals(-1) Then
                movieID = strTemp.Substring(0, strTemp.Length - 1)
            Else
                guess(m_nRatingCount).MovieId = movieID
                guess(m_nRatingCount).Rating = strTemp
                m_nRatingCount += 1
            End If
        Loop
        s.Close()
    End Sub

    Sub ProcessTest()
        'remove file
        Dim FilePath As String = TEST_PATH + "predictionAjustment.txt"
        Dim fileExists As FileInfo = New FileInfo(FilePath)
        fileExists.Delete()

        'perform export
        Dim movieId As Integer
        Dim output As New StreamWriter(FilePath, False, UnicodeEncoding.Default)

        Console.WriteLine()
        Console.WriteLine("Processing test: ")

        Dim _Guess As Double

        For i As Integer = 0 To N
            If Not movieId.Equals(guess(i).MovieId) Then
                movieId = guess(i).MovieId
                output.WriteLine(movieId.ToString + ":")
                _Guess = Round(guess(i).Rating, 2)
                If _Guess > 5 Then _Guess = 5
                If _Guess < 1 Then _Guess = 1
                output.WriteLine(_Guess)
            Else
                _Guess = Round(guess(i).Rating, 2)
                If _Guess > 5 Then _Guess = 5
                If _Guess < 1 Then _Guess = 1
                output.WriteLine(_Guess)
            End If
        Next
        output.Close()
    End Sub

End Module

Offline

 

Board footer

Powered by PunBB
© Copyright 2002–2005 Rickard Andersson