Frequently Asked Questions

 

Participation and submission | Prize structure | Legal stuff

Participation and submission

Why are some countries excluded from participation?

Those countries are on the U.S. Treasury Office of Foreign Assets Control’s list of embargoed counties for which we cannot provide economic assistance. If this list changes, we’ll post a change to the rules and let you know.

Why restrict number of prediction submissions to once a day?

To dissuade you from just training on the scoring oracle’s responses. See the discussion on the probe subset below if you still want to look under a streetlight...

My submitted predictions file: compressed or not?

Your call as long as the MD5 signature matches the file that you submit. We’ll apply gunzip on your submission and score the results if it succeeds, otherwise we’ll score the file you sent directly. A typical uncompressed prediction submission file is around 16Mb; compressed around 2.5Mb. Oh, and you might want to check that your MD5 program gets the same signatures that we included on the dataset files we provided.

The submission file format is described in the README included with the training set you download. You can also use this script to check the format of our submission.

Where can I find an MD5 program?

On *nix systems, try /usr/bin/md5sum. On Windows try the digestIT 2004 utility. Avoid older Perl implementations on Windows; some versions, notably 5.6, have given us grief.

Who can use the Forum?

The Forum is a place for you (and others like you) to carry on open discussions about topics related to the Prize. Anybody can read the Forum; posting is restricted to registered forum members. Registering for the Forum and the Prize are separate; you don’t have to be in competition to use the Forum.

Prize structure

Why RMSE as the metric? Why not other metrics?

RMSE, besides being well known and a single number, has the nice property that it amplifies the contributions of egregious errors, both false positives ("trust busters") and false negatives ("missed opportunities"). These are important properties to understand about any recommendation system. Of course, it is true that simple prediction accuracy does not address many of the other important aspects of making (and taking) a recommendation. It doesn’t deal with, for example, guessing in what order recommendations should be made. For an excellent discussion about metrics and evaluating recommendation systems see Herlocker, J, Konstan, J., Terveen, L., and Riedl, J. Evaluating Collaborative Filtering Recommender Systems. ACM Transactions on Information Systems 22 (2004), ACM Press, 5-53.

What about measuring the confidence of the predictions?

Yep, we calculate this and use it as part of making a recommendation. But not all systems can provide this, and we didn’t want to bother formalizing it for the Prize at this point.

What about measuring the coverage of predictions?

Yes, knowing which titles a system can make a prediction for would be good to know. But like confidence we figure rather than making it a criterion for winning, we’d simply ask the winners to report on these aspects of their system. Of course improvement in RMSE need not translate to increase in coverage or confidence. But we’ll see.

Why did you choose 10% improvement? Have you made a 10% improvement?

The RMSE on the test subset if you just predicted the average rating for each movie based on the training dataset is 1.0540. Cinematch’s RMSE on the test subset is 0.9525, a 9.6% improvement. We figured roughly to double that. Same with the Progress Prize--we want significant improvement, hence the 1% required improvement over the last prize winner.

How does Cinematch do it?

Straightforward statistical linear models with a lot of data conditioning. But a real-world system is much more than an algorithm, and Cinematch does a lot more than just optimize for RMSE. After all, we have a website to support. In production we have to worry about system scaling and performance, and we have additional sources to data we can use to guide our recommendations. But, as mentioned in the Rules and just to be perfectly clear, for the purposes of the Prize the RMSE values we report here do not use any of this extra data.

So why not include performance requirements on the learning and prediction system?

Yes, this is absolutely critical to the success of Cinematch -- in production. And we are, of course, very curious about the performance of all the algorithms. But including performance constraints would make the contest much more complicated, so we decided we’d worry about the systems scaling issues once we knew there was a "there there."

And while we are on the topic of system performance we should note that it took Cinematch a non-trivial amount of time to produce its predictions for the quiz and test subset. And we’re not even talking about the time it took to learn the training dataset. Think days. We’re talking serious horsepower here. Just a heads up.

Why only 100 million ratings? You’ve got over a billion after all...

The number was chosen to be "easily" downloadable and fit in memory for most machines, if needed. But you might think differently after you get started.

In fact we should give you another heads up: The full Prize download file (compressed using gzip) containing the training and qualifying dataset, movie list and other files weighs in a little shy of 700Mb. This will take a bit of time to download on most networks. For reference, the MD5 of the download file is 405b35960651cce0378b2b268e719df5.

How well does Cinematch do using all those extra 100’s of millions of ratings?

The RMSE experienced by customers on the Netflix site is significantly better than the RMSE reported for the training dataset. This is due both to the increase in ratings data but also to additional business logic we use to tune which of the large number of ratings to learn from. However, we’re keeping our actual RMSE on the site, as well as tuning heuristics and settings, confidential. But let’s just say we’d be seriously in the running for a Progress Prize, if we were eligible.

Why aren’t the margins of improvement linear with the size of the prizes?

Because the business responses aren’t. But don’t read too much into the discrepancy. Also, spuriously, $50,000 is 5% yearly interest on $1,000,000.

Why provide rating dates and movie names?

Other datasets provide it. Cinematch doesn’t currently use this data. Use it if you want.

As it happens we provided years of release for all but a few movies in the dataset; those seven movies have NULL as the "year" of release. Sorry about that.

Why not provide other data about the movies, like genres, directors, or actors?

We know others do. Again, Cinematch doesn’t currently use any of this data. Use it if you want.

Why this whole quiz/test subset structure? Why not reveal a submission's RMSE on the test subset?

We wanted a way of informing you and your competitive colleagues about your progress toward a prize while making it difficult for you to simply train and optimize against "the answer oracle". We also wanted a way for the judges to determine how robust your algorithm is. So we have you supply nearly 3 million predictions, then tell you and the world how you did on one half (the "quiz" subset) while we judge you on how you did on the other half (the "test" subset), without telling you that score or which prediction you make applies to which subset.

What is the probe subset for? It wasn’t mentioned in the Prize Rules.

The probe subset helps reduce the number of times you need to go to the scoring oracle. It has both similar size and characteristics to the quiz subset. However, unlike the quiz subset, you do have the answers for the probe subset. The probe subset enumerates a set of customer and movie id pairs whose ratings and dates are included in the training set we supplied. You just need to ask your system to make predictions for those pairs and then compute your RMSE based on the actual ratings for the pairs.

The RMSE Cinematch can achieve on the probe dataset is 0.9474. You can compare your progress against that number as often as you want. After someone wins the Grand Prize we’ll release the withheld ratings in the quiz and test subsets. We want to make a lasting contribution to the academic community before that: Providing standard training and test sets help people share observations and results while the Prize is in progress.

By the way, the md5 signature of the judging file, which contains the qualifying set ratings and defines the quiz and subset members, is 33d98576809268177d35c182898b6439. We’ll publish that file at the end of the Contest, when the entire dataset is deposited in the Machine Learning Archive at UC Irvine.

Legal stuff and the license

What’s with the third-party license restriction?

Software licensing is a complex bit of business. We don’t want the non-exclusive license you provide us to be encumbered by additional licenses and costs.

What does "originally developed" mean?

It means you wrote the code that counts and didn’t just acquire it from someone else. We want the Prize to go to the person that did the heavy-lifting. Now, of course, bundling together public-domain algorithms in a clever way counts as heavy-lifting. Heck, that’s what we did.

What is all this about "moral rights"?

Read this for an overview of moral rights. Again, we don’t want our non-exclusive license encumbered by additional restrictions.

Why the non-exclusive license?

First, we want to verify for everyone that the code did what was claimed; that means looking at it. And then we want to use it if we can. We’re a business and we want to make sure we can capitalize on the discovery. But we don’t want to impede the winner’s ability to capitalize on it as well. Actually, we hope they can build their own business and license it to others as well. That is the point after all.

Is there any customer information in the dataset that should be kept private?

No, all customer identifying information has been removed; all that remains are ratings and dates. This follows our privacy policy, which you can review here. Even if, for example, you knew all your own ratings and their dates you probably couldn’t identify them reliably in the data because only a small sample was included (less than one-tenth of our complete dataset) and that data was subject to perturbation. Of course, since you know all your own ratings that really isn’t a privacy problem is it?

Why so much legalese? Can’t we just relax about the whole thing?

We’d like to blame the lawyers but on this they have a good point. When the Prize involves this much money some people will try to work the angles rather than the problem. We don’t want that impulse ruining the Prize for everyone else. So chalk it up to better safe than sorry. Also, since we expect the Prize will be out there for a while we tried to anticipate as much "downside" as we can on the mechanics. We’re sure you’ll tell us when we missed something.

So who are the judges?

There are four judges for the Contest, two senior engineers from Netflix and two distinguished researchers from the Machine Learning community. Jon Sanders is Director of Recommendation Systems at Netflix. Stan Lanning is the developer of Cinematch. Before Netflix both of them worked at Pure Software developing the Purify™ and Quantify™ tools. Padhraic Smyth is Professor, Department of Computer Science, University of California, Irvine. Charles Elkan is Professor, Department of Computer Science, University of California, San Diego.

I have other questions. Where can I learn more?

Please review the FAQ and other discussions that are part of the Forum.