Forum for discussion about the Netflix Prize and dataset.
You are not logged in.
I would like to bring Your attention to two papers we have recently published:
1. Investigation of Various Matrix Factorization Methods for Large Recommender Systems.
In 2nd Netflix-KDD Workshop.
2. A Unified Approach of Factor Models and Neighbor Based Methods for Large Recommender Systems.
In 1th IEEE Workshop on Recommender Systems and Personalized Retrieval.
Both can be downloaded at our website.
The first can be downloaded from netflixkddworkshop2008.info as well.
Interestingly, a simple yet effective model has been developed independently by Gravity and BellKor:
Gravity's hybrid MF+NSVD1 method and BellKor's SVD++ are almost the same.
See equation (8) of our IEEE workshop paper, and equation (15) of BellKor's article:
Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model,
In Proceedings of the 14th ACM Int. Conference on Knowledge Discovery and Data Mining (KDD'08).
http://public.research.att.com/~yehuda
Last edited by Gravity (2009-04-19 13:04:39)
Offline
First of all. Thanks for the wonderful paper.
I'm amazed by the probe10 set you have constructed and thanks for sharing it. One thing I would like a clarification though.
In these two papers you don't mention about removing the probe data from that total train data. This is what everyone has been doing in this competition for a long time now, since it is really helpful when we blend multiple algorithms and need to find the weights for each set.
For each algo, I train on all data minus probe data to get the probe predictions and then re-run the algo with all the data included to get the qualifying predictions. This, of course, is a huge burden since I have to run everything twice.
Do you use a similar scheme but with the probe10 removed from the training data? Or is it even better? I was thinking something like removing the probe10 from the data, running each algo once, and then calculating the qualifying predictions from this train.
In this case... The "fully trained" quiz rmse will be very near to the the "fully trained minus probe10" quiz rmse since only 140,840 rates will be absent from the training phase.
Thanks in advance.
Offline
Quiz and Probe10 RMSEs are from predictions made by the same model.
We do not use fully trained models.
I guess the improvement may be less than 0.0006, when we generate models from all available training data.
Offline
Just Brilliant! It would have saved me so much time if only I had this before...
THANKS for sharing.
Offline
We published a new paper that summarizes our matrix factorization approaches, and also evaluates them on the MovieLens and the Jester datasets:
Scalable Collaborative Filtering Approaches for Large Recommender Systems, Journal of Machine Learning Research, March 2009
Available here and here
Last edited by Gravity (2009-04-19 13:06:09)
Offline
From the paper:
Since there is no standard train-test split of the data set, we applied a simple random
split to generate a 90%–10% train-test setting.
This split can be downloaded from our website: http://gravityrd.com.
Could you post a link to download the movielens data split?
Offline