Netflix Prize: Forum

Forum for discussion about the Netflix Prize and dataset.

You are not logged in.

Announcement

Congratulations to team "BellKor's Pragmatic Chaos" for being awarded the $1M Grand Prize on September 21, 2009. This Forum is now read-only.

#1 2009-05-12 21:31:24

vzn
Member
Registered: 2006-10-04
Posts: 109

interview, new #2 team, GPT, grand prize team

hi all. GPT, grand prize team, has very graciously agreed to do an email interview wrt some basic questions about their strategy. GPT, in case you havent noticed, has shot to the top of the leaderboard in a very short time; it was started by Takacs et al only at the beginning of 2009. so, less than a half year. as someone else just told me in email, "their success is amazing".

[as I write this [for the record], GPT is in 2nd place with 9.64% improvement. BellKor/BigChaos are tied at 1st place with PragmaticTheory at 9.65%.]

GPTs strategy is quite a juxtaposition to everyone else. almost every other team incl the very top ones are very resistant to do what might be called "inter-team blending" for very good reasons-- it dilutes the share of the prize across multiple teams. intra-team blending is of course quite common and top teams seem to be blending up to about 100+ separate runs.

my personal congrats to GPTs success & innovation!


- The Grand Prize Team homepage (http://www.grandprizeteam.com)
- The Gravity homepage (http://www.gravityrd.com)
- The Dinosaur Planet homepage (http://www.eecs.berkeley.edu/~lmackey/d … lanet.html)
- Our new JMLR paper: http://jmlr.csail.mit.edu/papers/volume … acs09a.pdf
- GPT Rules (http://www.grandprizeteam.com/GPT/rules.html)

also, here is a previous interview I did with Pragmatic Theory
http://tech.groups.yahoo.com/group/theo … sage/14099



Q. how long has GPT been in existence?

A. The first submission of GPT to Netflix was on January 20, 2009.


Q. was it your idea? how did you get the idea?

A. It was the idea of the Gravity guys (Gabor Takacs, Istvan Pilaszy,
Bottyan Nemeth and Domonkos Tikk). We all considered it likely that
a "grand coalition" will appear in the end-game phase of the competition.
We also thought that it would be nice if a collaborative team won the
Grand Prize. We wanted to play an active role in the events,
therefore we founded GPT together with team Dinosaur Planet.


Q. how many submissions have you had? currently, how many are in the winning/top mixture at the top of the leaderboard?

A. We have had more than 250 submissions.
One submission contains ~3 predictors on the average.
Currently there are ~120 predictors in the top mixture.


Q. you seem to be experimenting with blending algorithms, true? what percentage gain have you gotten from improved blending algorithms?

A. In the case of GPT it is difficult to say this exactly,
because we don't know which individual predictors
were obtained from advanced blending methods.

Based on my blending experiments and my guesses about the individual predictors
I estimate that 0.2-0.3 percentage points gain is due to improved blending algorithms.


Q. do you have any official feedback from netflix about your particular strategy? as I read the rules, it seems like it is fair, but I can imagine there might be objections.

A. We have no official feedback.
I think it's quite clear that the rules allow collaboration between teams.


Q. a Q I asked the other team. suppose after 5 years, there is still no winner. current rules state grand prize will not be awarded. do you prefer (a) grand prize not be awarded? or (b) it be awarded to top team, or (c) it be split in some way.

A. We have no influence on this, and we cannot predict what would
we prefer at the end of the 5th year.


Q. there is a little controversy on your approach. someone on the bulletin board suggested you are taking blending algorithms submitted for your GPT team and then using the same techniques on your own entries. any comment on that?

A. Requiring the source code of blending methods or not was a decision point at
designing the GPT rules. Both alternatives have advantages and disadvantages.

Of course, a disadvantage of requiring source code is that theoretically
it gives the founders more possibilities to take unfair advantages.
But I think that regardless of what the rules are, the participants
have to trust in the founders to some degree.

On the other hand, GPT's RMSE - expect at the very beginning - was far better than Gravity's one, and Gravity has neither chance nor intent to overtake GPT. We think the former fact may give an implicit assurance that Gravity will not be in a position to claim the Grand Prize alone.


Q. did you attend either of the netflix conferences? what were your impressions? do you think there should be more conferences, or since progress seems to be stalling out, hold off on further ones?

A. We attended both Netflix conferences, and we also presented papers there.
We think that publishing is important. At least it drives us to invent something new.
We have learned from the conferences that many teams have very similar ideas, thus, when you intend to publish, you have to be as fast as possible.

Exchanging ideas too frequently may have a negative effect on the diversity of  solutions. I think that one conference per year is a good trade-off.
My opinion is that a new conference would give an impulse to
the contestants, and it would speed up progress.


Q. when I asked about submissions I realize I was not clear (although thanks for your answer). I meant, "submissions to the GPT team from other participants". how many submissions do you get, and how many from other teams show up in your winning/top mixture?

A. I also meant "submissions to the GPT team from other participants".
A submission to GPT can contain multiple predictors (e.g. an SVD, a KNN,
and a Boltzmann-machine). It is possible that only some of them will be included
in the top mixture. This is why I wrote the number of predictors in the top mixture
instead of the number of submissions.


Q. you say you have had 250 submissions. I am having trouble phrasing this question. but basically, I would guess that there are some number of individual participants/contacts who have data now embedded in GPT who email you or register on your site. are you saying, you have 250 separate participants/contacts?

A. No.


Q. when you said 250 submissions, I imagined that some would be by the same contacts/participants. ie one contact/participant makes multiple submissions. true?

A. Yes.


Q. roughly, can you estimate, how many individual contacts/participants are involved in GPT at this pt? also, maybe some of those contacts dont tell you what teams they are on, but can you estimate, roughly how many netflix teams would be represented/mixed into your current final solution?

A. Currently we have 22 registered participants.
7 of them are represented in the final solution.
(All are Netflix teams. Of course the 2 founders are also represented.)

One comment:
People can submit probe files to test how much can they improve
on GPT's result. Registered participants can also submit qualifying files.
When I wrote we had more than 250 submissions I meant that
the total number of submissions (with or without qualifying) is more than 250.

Maybe it would be better to leave this complicated submission stuff out,
and write the following:
Currently there are ~120 predictors in the top mixture, originating from 9 Netflix teams.


Q. what is your opinion about the utility of the prize forum to the contest?

A. I think that the forum is a useful source of information.
Of course it contains useless threads too,
but even top teams can find valuable information in it.


Q. in your view, how much has the overall field of collaborative filtering been advanced by the netflix contest in particular?

A. The Netflix contest made it possible to compare collaborative filtering
algorithms under perfectly equal conditions. The performance measure
is very reliable and cheating is impossible.

In my opinion, this caused a revolution in collaborative filtering.
It turned out for many old algorithms that they do not scale well to large problems,
and they are not as accurate as claimed in scientific papers.

The current state-of-the-art algorithms are very different
from what we had 3 years ago.
On the other hand, we realized that surprisingly, Netflix Prize originated
algorithms are often treated as practitioner-solutions and are not well acknowledged by the scientific community. We also find strange that
some papers still use old and small datasets to justify their scientific value,
despite the availability of Netflix dataset.


Q. everyone on your teams were probably working in areas not related to collaborative filtering prior to the contest, to some degree. to what degree were you already working in it? can you comment on whether some of the techniques in the contest are carrying over into other areas of research that you've worked in? any applicability of the ideas?

A. (I don't know the answer of the Dinosaur Planet guys.)

The members of Gravity didn't hear about collaborative filtering before the contest,
but all of us had some experience in data mining.
I think that we have learned a lot from contest.
Some of the techniques (e.g. efficient matrix factorization) can directly
be applied in other areas. The utility of other ideas is more indirect.
I believe that the experience we have acquired during
the contest made us better data mining problem solvers.

Last edited by vzn (2009-05-13 20:01:48)

Offline

 

#2 2009-05-14 11:01:46

ADifferentName
Member
Registered: 2008-06-29
Posts: 19

Re: interview, new #2 team, GPT, grand prize team

Very interesting interview.  You answered several things that I had wondered about.  Thanks GPT and vzn!

Offline

 

#3 2009-05-14 21:52:20

dale5351
Member
From: Columbia, MD
Registered: 2008-10-18
Posts: 116

Re: interview, new #2 team, GPT, grand prize team

Interesting, and worth considering.

I am confused about the submission format which allows two floating point numbers per user.  The Netflix qualifying allows only the single prediction for the qualifying set.  What are the two numbers supposed to be?

Offline

 

Board footer

Powered by PunBB
© Copyright 2002–2005 Rickard Andersson