Netflix Prize: Forum

Forum for discussion about the Netflix Prize and dataset.

You are not logged in.

Announcement

Congratulations to team "BellKor's Pragmatic Chaos" for being awarded the $1M Grand Prize on September 21, 2009. This Forum is now read-only.

#1 2007-12-18 16:05:37

YehudaKoren
Member
Registered: 2007-09-23
Posts: 54

How useful is a lower RMSE?

Different conversations on this forum and on other blogs wondered about the usefulness of achieving a lower RMSE.
That is, what impact on users we expect by lowering the RMSE by, say, 10%.
This even led some posters to question the importance of the Netflix Prize challenge.
I would like to shed some light on this important issue, by examining the effect of lowered RMSE on a practical situation.

A common case facing recommender systems is providing "top K recommendations". That is, the system needs to recommend the user the top K products. For example, recommending the user a few specific movies which are supposed to be most appealing to him. A major question would be to assess the effect of lowering the RMSE of the quality of top K recommendations.

To evaluate this, I used all 5-star ratings from the Probe set as a proxy for movies that interest the user. My goal is to find the relative place of these "interesting movies" within the total order of movies sorted by predicted ratings for a specific user.
To this end, for each such 5-star rated movie, M, rated by user U, I selected 20 additional random movies and predicted the ratings by U for M and for the other 20 movies. Finally I order the 21 movies based on their predicted rating, in a decreasing order. (The number 20 is arbitrary, and has no real effect on the discussed result.)

Notice that the 20 movies are random, some of which may be of interest to user U, but most of them are probably of no interest to U. Hence, the best hoped result is that M (for which we know U gave the rating of 5) will precede the rest 20 random movies. A case where none (0%) of the random movies appears before M in the ranking will be scored 0%. If a single random movie out of the twenty (5%) appears before M, the score would be 5%. And so on, such that if all (100%) of the random movies appear before M in the sorted order, the associated score would be 100%. I call this scoring a "ranking-based test".
Overall, the Probe set contains 384,573 5-star ratings. For each of them (separately) we draw 20 random movies, predict associated ratings, and derive a score between 0% to 100%. Eventually, we average the score over all these 384,573 5-star cases, to achieve a final ranking-based score.

Let's see how different methods fare. We begin with the popularity-based method, where a predicted rating is taken simply as the mean rating of the movie. Such a method would generate a RMSE of 1.0527 on the Probe set. In our ranking-based test, this method achieves a score of 18.70%, meaning that the 5-star rated movies are ranked, on average, after 18.70% of the other movies.

A more sophisticated approach would involve collaborative filtering. Let's start with the very simple movie-movie neighborhood-based approach, where weights are taken as the Pearson correlation coefficient. This is the most common approach in the literature preceding the Netflix Prize competition. It would yield a RMSE=0.9430 on the Probe set, an improvement of 10.42% over the movie-mean-based RMSE. The ranking-based score for this method drops to 14.92%, that's an improvement of 20.2%  over the baseline approach.

Now let's try a latent factor model enhanced by neighborhood information. Its RMSE on the Probe is 0.8949 (15% improvement over baseline). We can clearly achieve better RMSEs using our full model, but the chosen method still represents some of the innovations developed during the first year of the Netflix Challenge. For this method the ranking score drops to 10.72%. The interpretation is that when ranking all movies for a single user by their predicted value, 10.72% of the movies will precede those rated 5 within the Probe set. Notice that this is a very significant 42.67%  improvement over the baseline result (which was 18.70%), and a 28.15% improvement over a standard CF model.

What I learn from this is that the small improvements in RMSE translate into very significant improvements in quality of the top K movies. In other words, a 1% improvement of the RMSE can make a big positive difference in the identity of the "top-10" most recommended movies for a user.

All this makes me optimistic about the practical benefits of the much hard work that people put into this competition.

Happy New Year,
Yehuda

Last edited by YehudaKoren (2007-12-18 21:32:46)

Offline

 

#2 2007-12-18 20:59:27

sanity
Member
Registered: 2007-11-26
Posts: 10

Re: How useful is a lower RMSE?

I'm just confused about the rational behind your approach.  I think you are going in the right direction by focusing on the fact that recommendation engines are often used to select those items that have the top predicted rating, but after that I see little justification behind your approach.

Why do you think this approach is a good way to estimate the quality of the user's experience?  Can you contrast this with the approach I outline here?

Last edited by sanity (2007-12-18 21:03:00)

Offline

 

#3 2007-12-18 21:28:59

YehudaKoren
Member
Registered: 2007-09-23
Posts: 54

Re: How useful is a lower RMSE?

Ian,

I proposed a natural measure for assessing the quality of the top-K recommendations, or in other words, to evaluate how well a method orders movies for a user. Lower ranking scores indicate improved ordering, and importantly, these scores have a direct intuitive interpretation - the fraction of the overall movies that is ranked before those rated 5 in the Probe.

However, frankly, I couldn't find a direct link between your approach and the quality of movie-ordering. You compare distribution of true rating against predictions, which has much more to do with the RMSEs themselves, so no wonder that a small change in the RMSE will translate into a small change in that distribution.

I would encourage you to come up with an alternative metric, but it should stem formally and directly from the movie ordering objective in order to be plausible


Yehuda

Offline

 

#4 2007-12-18 22:03:11

sanity
Member
Registered: 2007-11-26
Posts: 10

Re: How useful is a lower RMSE?

Yehuda,

Your assessment of your approach as intuitively being proportional to user satisfaction is entirely subjective based on what I've understood so far.  I was hoping you have some kind of objective justification for why this method of measurement maps in some useful way to user satisfaction.

I'm not trying to defend my approach.  I completely acknowledge that I have not yet offered an objective justification for my method of measurement.

I think we need to develop a measurement approach that maps to some plausible model of user satisfaction, that would be useful to a likely user of a recommendation system.

An example of such a plausibility measurement would be to make the assumption that the user of the CF algorithm generates revenue every time they successfully recommend an item to a user that the user likes.  If the algorithm recommends 10% more items that meet this success threshold, then the user generates 10% more revenue.  This assumption may not be exactly true for Netflix due to their business model, but it will be valid for many potential users of collaborative filtering systems.

Perhaps your approach meets this criteria, in which case, I'd be grateful if you could explain why.

Ian.

Last edited by sanity (2007-12-18 22:03:57)

Offline

 

#5 2007-12-19 00:32:57

DB
Member
From: Home
Registered: 2006-10-20
Posts: 114

Re: How useful is a lower RMSE?

Ian,

Yehuda is correct.
Your analysis is based on comparing the distribution of actual values Vs predicted, which is meaningless.
You should compare predicted Vs actual and not vice versa. Just to illustrate, see the following example, which shows that certain data, with your approach, will show that for each predicted value - most of the actual values match the predicted value. while if you turn the rows into columns for the same data - the second graph shows that for each actual value group - the predicted values are totally disorganized:

http://farm3.static.flickr.com/2048/2122482374_b2542b7df7.jpg

Offline

 

#6 2007-12-19 02:27:54

Bored Bitless
Member
From: Leamington Spa, UK
Registered: 2007-02-22
Posts: 154

Re: How useful is a lower RMSE?

Thanks Yehuda, A nice simple and clear cut experiment IMHO.

Recently I tried using my various models for predicting ratings for myself, I built an interactive GUI that allows me to add my own ratings and then refresh the list of predicted ratings in descending order. What I noticed was that small improvements to the model being used don't radically alter the overall RMSE of most ratings, but DO tend to significantly alter the order of movies in top-down order. So in fact small changes in the predicted ratings can radically alter their order.

Another way of thinking about this issue would be to consider that a small lowering of RMSE may be due to many small adjustments to many ratings versus a few large adjustments. If the latter case was predominant then the business case probably would suffer because we would simply be improving the top-N movie predictions for relatively few users.

BB

Last edited by Bored Bitless (2007-12-19 11:32:04)

Offline

 

#7 2007-12-19 06:40:40

YehudaKoren
Member
Registered: 2007-09-23
Posts: 54

Re: How useful is a lower RMSE?

sanity wrote:

Yehuda,

I think we need to develop a measurement approach that maps to some plausible model of user satisfaction, that would be useful to a likely user of a recommendation system.
...
Perhaps your approach meets this criteria, in which case, I'd be grateful if you could explain why.
Ian.

Yes, Ian. Actually, this is exactly what I tried to do, given the available data. Those movies rated 5-stars were my best proxy for suggestions that will satisfy the users, and hence I judge order quality by the relative place of these "satisfying" recommendations within the full order.


Let me point a few more issues with your previous measure:

1) You never measure an order, but just refer to rating above or around some value. This way you fail to distinguish between, e.g., the 1st place and the 10th place. For example, the vast majority of predicted ratings would be above 3.5. Now, breaking all those ratings >3.5 into bins is not very meaningful as it disregards the internal ordering of all those ratings.

2) A measure that caters for a top-K recommender must treat each user separately, as the goal is to supply best K recommendation for each single user. In other words, we are not trying to order all movie-user pairs, but after fixing a single user, we try to order all movies for him. However, your measure bundles all users together, thus becoming irrelevant for evaluating top-K quality, and much more related to RMSE quality.

3) Notice that my measure is based on many more test points than just the Probe set itself. In fact, it is trying to evaluate the quality of total order of the 17,770 movies for each individual user.

Yehuda

Offline

 

#8 2007-12-19 08:52:33

sanity
Member
Registered: 2007-11-26
Posts: 10

Re: How useful is a lower RMSE?

Thanks for the explanations guys.  I will digest and respond if I have further questions.

Frankly I will be glad if my initial assessment is wrong, having spent so much time working on a CF algorithm :-)

Offline

 

#9 2007-12-19 11:28:17

sanity
Member
Registered: 2007-11-26
Posts: 10

Re: How useful is a lower RMSE?

Ok, I've thought about it a bit more, but I'm still unclear about how Yehuda's approach might map to an economic advantage for a user of the algorithm.  Thanks to everyone for being patient with me :-)

Lets construct a scenario, where the effectiveness of a recommendation algorithm has a direct impact on an economic outcome.  I want to come at this from the perspective of someone that wants to sell a CF algorithm to someone, and needs to make an economic case for it.

I have a website where I want to recommend products to users.  For a given user, I need to select 5 products to show the user, in the hope that the user will wish to purchase one or more of those products.  My revenue is proportional to the total number of products purchased.

If we can determine the mean number of products purchased by a user with different recommendation approaches, then we can say "you will generate X% more revenue using this algorithm versus this other algorithm".  So we might find that if we select products at random, the average user purchases 0.3 products, but with a more sophisticated approach, this is increased to 0.5, hence a 66% increase in revenue.

With the Netflix dataset we could say that a 5 star rating equates to a purchase, and then we could look at the average number of 5 star ratings in the top 5 probe recommendations for different algorithms.

I've spent some time talking to potential users of CF algorithms, and this is the kind of metric they want.  An RMSE isn't useful to them because they can't use it to estimate an increase in revenue.  Yehuda's metric, of where the desirable item would appear within an ordered list of 20 randomly selected items, doesn't seem to map to an economic advantage either.

Am I missing something?

Last edited by sanity (2007-12-19 11:43:54)

Offline

 

#10 2007-12-19 12:00:10

YehudaKoren
Member
Registered: 2007-09-23
Posts: 54

Re: How useful is a lower RMSE?

Ian,

There may be some misunderstanding:
My measure doesn't check where the 5-star ratings fall within 20 movies, but checks where they fall among ALL 17,770 movies.
That is, a ranking-score of 10% means, that when ordering all movies for a specific user, we expect the 5-rated movies from the Probe to fall, on average, after 10% of the movies in this order.
Picking 20 random movies is just a computationally efficient way to achieve this goal through sampling. In fact you can change this "20" number to "10" or "17,770", and the result will stay virtually the same.
Maybe this helped, because I'm not sure that I understood what you are looking for. Anyway, I will try also another angle.

I agree that for some datasets we can formalize your economic notion into a working quality measure.
Especially datasets where the observed user actions are monetary transactions...
However, the Netflix data doesn't supply us with economic values but with interestingness values (star ratings).
Therefore, I was after measuring the "interestingness" of the ordered recommendation given to the user, and suggested an appropriate measure accordingly.

BTW, I am sure that there are other valid measures to evaluate this. You just need to check that these measures pass some sanity checks, and can work robustly with the limitations of the available data.

Yehuda

Offline

 

#11 2007-12-19 14:09:45

sanity
Member
Registered: 2007-11-26
Posts: 10

Re: How useful is a lower RMSE?

Yehuda,

I do understand that the 20 movies are just a sampling, and that the percentage actually pertains to the position of the 5 star movies within the entire dataset.

My proposal is that through a few assumptions, we could infer the economic benefit of using a particular CF algorithm based on its performance on the Netflix dataset.  Obviously, the Netflix dataset isn't perfect for this, it would be better if it actually dealt with monetary transactions, rather than indications of how interesting something is.  But I think its probably good enough.

For example, just as you have defined a 5-star rating to be a special threshold of interestingness for the purpose of your measure, we could make the assumption that a 5-star rating from a user results in an economic transaction, and then look at the economic benefit of differing approaches.

Imagine that you were trying to sell your CF to someone based on some metric of its performance.  I'm sure you agree that this would be difficult based on RMSE because the potential buyer would have no way to map RMSE to economic benefit.  Similarly, I can't see any way to map your measure to an economic benefit for the user.

But I can see how the mean number of 5-star actual ratings in the top-N predicted ratings for a given user would translate into a relative economic benefit that people could understand.  If one algorithm produced, on average, 0.5 5-star ratings in the top-10 recommendations, and another algorithm produced, on average, 0.8 5-star ratings, then this can be used to infer the relative revenue that could be generated using each of these approaches.

Offline

 

#12 2007-12-19 14:34:37

Lazy Monkey
Member
Registered: 2007-12-13
Posts: 93

Re: How useful is a lower RMSE?

Sanity

I think you may be asking for something too specific.  Netflix wants a better theory of collaborative filtering, they have picked the measurement and named their price.  The whole task is somewhat artificial since we have one piece of information that will not be available in commercial use:  we know that the user has already rated the movie. 

Artificial as the yardstick may be it may still be the best one available to achieve what Netflix wants to achieve.  The benefits for Netflix from making good recommendations could be substantial.  Given the subscription nature of the Netflix business it is not important that the users rent a lot of movies (in fact it is a negative for Netflix) - it is extremely important that the customers get value for the money for the movies they do rent.

The business challenge for Netflix is to predict the movies that a user will like but part of  that challenge is avoiding the false positives of recommending a movie that the user does not enjoy.

Offline

 

#13 2007-12-19 14:46:07

YehudaKoren
Member
Registered: 2007-09-23
Posts: 54

Re: How useful is a lower RMSE?

Ian,

If I were selling my algorithm, RMSE could have been a good start point to get them listening. Afterwards, I would start a small trial with some set of actual users, and test how the algorithm improves some measurable factors regarding the following behavior of the users.

Anyway, if I had to design the competition, I couldn't choose a better evaluation metric than RMSE. First, unlike MAE, it is mathematically elegant. Second, unlike top-k related metrics, it is efficient to compute. So IMHO RMSE is still a very good choice. Oh, maybe I would exchange it for MSE, to make the progress look more impressive smile

Yehuda

Last edited by YehudaKoren (2007-12-19 14:46:28)

Offline

 

#14 2007-12-19 15:47:00

sanity
Member
Registered: 2007-11-26
Posts: 10

Re: How useful is a lower RMSE?

I don't disagree with the usefulness of RMSE as a way to evaluate whether algorithm A is better than algorithm B.

But this is of limited use.  It doesn't tell us what the practical implications are of algorithm A having a 5% or 10% lower RMSE than algorithm B.  Why does it matter?  If we don't know the benefit, how do we know whether the costs are worth it?  What if our slaving to get from 0.87 to 0.85 results in a negligible difference in user experience?

Sure, if all you care about is winning the prize then it makes a big difference, but I'm more concerned with the difference CF algorithms can make in general.

When you say that the first 5-star rating occurs after 10% or 20% of movies arranged in descending order of predicted rating, its useful as a way to judge whether algorithm A is better than algorithm B, but it has limited meaning except as a relative measure.

I'm proposing a simple metric where if it says that algorithm A is 5% better than algorithm B, that this 5% is meaningful, that I can turn it into a dollar value (even if that requires a few simplifying assumptions).

In his original post, Yehusa says "What I learn from this is that the small improvements in RMSE translate into very significant improvements in quality of the top K movies"

But really this notion of "quality" is entirely fuzzy, and while it is undoubtedly a different metric than RMSE, and while it tries to emulate the way that items are actually presented to users (the top X predicted ratings), the numbers produced by it are still only useful when evaluated relative to each-other, they have no absolute meaning.

Last edited by sanity (2007-12-19 15:55:20)

Offline

 

#15 2007-12-19 17:36:57

YehudaKoren
Member
Registered: 2007-09-23
Posts: 54

Re: How useful is a lower RMSE?

sanity wrote:

When you say that the first 5-star rating occurs after 10% or 20% of movies arranged in descending order of predicted rating, its useful as a way to judge whether algorithm A is better than algorithm B, but it has limited meaning except as a relative measure.

I'm proposing a simple metric where if it says that algorithm A is 5% better than algorithm B, that this 5% is meaningful, that I can turn it into a dollar value (even if that requires a few simplifying assumptions).
...
while it tries to emulate the way that items are actually presented to users (the top X predicted ratings), the numbers produced by it are still only useful when evaluated relative to each-other, they have no absolute meaning.

Ian,

We are in agreement here. My proposed measurement is useful for relative comparisons. Although the absolute value is interpretable, I wouldn't read too much into it. However, absolute measures are very application specific, and it is probably unrealistic to use them unless having a very specific goal/task in mind.

Yehuda

Last edited by YehudaKoren (2007-12-19 17:38:04)

Offline

 

#16 2007-12-19 19:19:56

netfax
Member
From: Norman, OK
Registered: 2007-05-18
Posts: 78

Re: How useful is a lower RMSE?

I've been one of the skeptics.

Trying to look at things from a customer standpoint, I think Yehuda does have a point in homing on the movies that I will likely rate a 5.  I rarely rate a 5 and do not mind watching 4's.  As a customer, I've rated over 350 movies so far and there are so many recommended movies for me to watch, but one look at the title and the short plot summary and I cannot convince myself to rent those movies.  Most of these are predicted 4's.  As an example, movies dating back to the 60's and 70's.

I think sanity is diluting his results (for lack of a better descriptive term) by selecting a rating of 3.5.  What would the results be if sanity set it at 4.5 instead of 3.5?

I would go with Yehuda's approach,  I'd like to see movies I'd rate a 5, coz to me there are lots of 4's and 3's.  There are just tooo many so-so movies, and so little time.  As a customer, I'd like for Netflix to find and recommend the movies I'd rate a 5.

One other thing, if a recommender system predicts 5 and I say 1 or 2, that would be a real turn-off.

Offline

 

#17 2007-12-19 21:35:54

nate
Member
Registered: 2006-10-11
Posts: 31

Re: How useful is a lower RMSE?

YehudaKoren wrote:

To evaluate this, I used all 5-star ratings from the Probe set as a proxy for movies that interest the user. My goal is to find the relative place of these "interesting movies" within the total order of movies sorted by predicted ratings for a specific user.

Thanks for sharing that approach, Yehuda.   I think it's brilliant.  I had played around with a similar metric, which involved hiding half of a user's ratings, generating an ordered set of predictions for that user, and then determining the chance that the highest prediction also present in the hidden half was actually rated  5.    Your approach is less convoluted, and doesn't require masking off half the user's ratings.

I'd be interested if you came across any cases where the rank percentages you calculated diverged from the RMSE --- that is, where a set of predictions with a lower RMSE yielded a higher rank percentage.  Intuitively, I think this is bound to happen, and that an algorithm optimized for recommendation could give a lower rank percentage than an algorithm with a better RMSE that is tuned for prediction.

Have you thought much about this?

Offline

 

#18 2007-12-20 06:48:50

YehudaKoren
Member
Registered: 2007-09-23
Posts: 54

Re: How useful is a lower RMSE?

nate wrote:

Thanks for sharing that approach, Yehuda.   I think it's brilliant.

Thanks!

nate wrote:

I'd be interested if you came across any cases where the rank percentages you calculated diverged from the RMSE --- that is, where a set of predictions with a lower RMSE yielded a higher rank percentage.  Intuitively, I think this is bound to happen, and that an algorithm optimized for recommendation could give a lower rank percentage than an algorithm with a better RMSE that is tuned for prediction.
Have you thought much about this?

Actually, yes, though I haven't experimented enough with this. When I replaced movie-mean with the more substantial "global effects" scheme, which substantially lowers the RMSE, the ranking score went in the other direction - up a little bit, contrary to my expectation. On the other hand, I also tried another method with a RMSE slightly higher then the reported .8949, and the ranking score went up as well, as expected. I would bet that RMSE and rank-scores do generally move in the same direction, but not in a lockstep. Also, rank-score movements are more magnified.

If one really cares about the top-K, he can develop methods optimizing more appropriate measures than RMSE. However, given the elegance of RMSE, optimizing rank-K-based measures might be just too complicated and doesn't worth the effort. But this is just a speculation...

Yehuda

Offline

 

#19 2007-12-31 18:28:07

vzn
Member
Registered: 2006-10-04
Posts: 109

Re: How useful is a lower RMSE?

some interesting dialogue here on different metrics. I found a good survey
paper once on good collaborative filtering metrics. however, the problem of
picking the right metric is to a large degree a nonscientific problem.

it would be
very difficult to create tests that tend to support one metric over another. I suppose
one approach would be to send videos to users based on the different metric
systems and then have them rate their overall satisfaction.  you can see the problem, its almost like measuring effectiveness of metrics requires a new metric, a meta-metric.

another important point is that different metrics may be optimizing different aspects of the problem, and I highly suspect there is some arrow-theorem (see wikipedia) like result that the different goals that one would like to optimize for are in fact incompatible.

arrows thm is a surprising but beautiful result from game theory about elections that names several highly preferred properties of elections, and proves they are in fact mathematically inconsistent. in fact various research into elections may be highly relevant here. in election theory, people vote on candidates, in collaborative filtering, they rate objects. not so different.

in fact I can name one directly that I have read about, I think bennett/prizemaster may have mentioned this in an interview or talk. the new releases at netflix & other rental sites are the most in-demand, but also the most expensive for netflix to buy/distribute.

therefore optimizing customer satisfaction would involve many new releases but this would not maxmimize profits. on the other hand, maximizing profit would involve sending out a lot of the "long tail" videos that are not so expensive for netflix to buy.

but, given all that, I do suspect there is a nonlinear effect to some degree between the improvement in the RMSE and customer satisfaction and/or netflix bottom line. such that small improvements in RMSE may lead to an improved/worthwhile overall system, ie maybe impacts the *average* customer in a bigger way. yehuda seems to have shown this directly with his other new metric.

but, there is always "diminishing marginal returns" at some point.

it would be very interesting if netflix did a study about how gains in the RMSE translate to gains in customer satisfaction or retention of customers or new customers. it would be nontrivial/difficult to design such an experiment. the latter are all rather difficult to measure as related to rating quality. the relationship would be indirect.

netflix did do a study that measure the cost of their rating system versus the cost of other internal systems, I recall seeing it in Bennet's speech. it was found that the rating system was not so expensive vs other approaches yet was worthwhile. I dont recall the exact x/y axis. did anyone else recall that graph? I will look it up if there is further interest. but I would be surprised if netflix has done any deeper analysis than that.

collectively this crowd could build some much greater insight into the deeper problem related to eg choice of metrics if netflix employees were permitted/willing to more openly collaborate on this forum. but netflix is in a deathmatch competition with blockbuster, & as just announced recently, even apple. its a small miracle we are witnessing this competition & a testament to netflix's appetite for risk & reward.

its a pity that prizemaster feels he has to be so removed from dialogue here in the interests of fairness (& there is zilch dialogue with anyone internal at netflix), that would be a great topic of discussion.

it seems that netflix is willing to dip their toes in the water, but not jump in. (I dont blame them for that)

on the other hand, netflix has done a massive service to all the researchers here by just picking one reasonable metric and eliminating any uncertainty over the metric used. it is conceivable they could have designed the contest completely differently such that they used some other possibly less-quick approach eg measuring customer satisfaction, and then have the collaborative community try to figure out the best rating metric as part of the problem.

another question is, why $1M prize? a very careful company would have figured out exactly how much each change in RMSE is really worth based on some measurements. but such a measurement seems difficult, again. it seems the CEO hastings & bennett probably just came up with some big number that seemed reasonable & that would motivate the crowd. (Ive wished for more background on that decision, but despite all the trappings of openness eg bulletin board etc, again netflix is being pretty opaque about a lot of this).

is a 10% improvement really worth $1M? more? less? $1M looks definitely very generous compared to what other companies are offering-- zilch. [but, on the other hand, netflix has a large market capitalization...]

but I personally would like to know of at least some theoretical justification for RMSE over MAE (mean average error), as I mentioned elsewhere early in the contest.

what everyone should realize is that there are distinctly different goals of scientific research & business development that are overlapping to various degrees with this contest, & are not fully compatible... netflix has advanced the scientific research tremendously with the contest, but can only go so far.

there are some much deeper scientific issues that I think literally could be explored for decades, one promising one I can think of-- I suspect there may be correlations between movie preferences & a persons score on various psychological tests. (eg the basic metrics like introverted vs extroverted etcetera) -- now wouldnt *that* be a fascinating study?

Offline

 

#20 2008-01-01 11:29:03

ses84
Member
Registered: 2008-01-01
Posts: 3

Re: How useful is a lower RMSE?

I don't think a Metametric is necessarily that difficult in this context.  Of course it would have its limitations, but Customer Level Profitablity shouldn't be that hard to calculate.  Given this Netflix could use a Test and Control design and see if you made more money from the new treatment.  Obviously, there are a bunch of limitations and you'd have to at least partially implement the solution to test it.  Given thats the case you would obviously have to use some other methods to decide which if any alternate system you wanted to test, but once you decided to test, I don't think it would be that difficult to determine if the new system made Netflix more money.

Obviously any metric is always going to have its limitations and a decision maker is going to have to take a lot of things into account, but I think there are some decent ways to evaluate if the new model is an improvement.

Offline

 

Board footer

Powered by PunBB
© Copyright 2002–2005 Rickard Andersson