Netflix Prize: Forum

Forum for discussion about the Netflix Prize and dataset.

You are not logged in.

Announcement

Congratulations to team "BellKor's Pragmatic Chaos" for being awarded the $1M Grand Prize on September 21, 2009. This Forum is now read-only.

#1 2006-10-27 17:37:36

icefox
Member
From: Oslo, Norway
Registered: 2006-10-20
Posts: 73
Website

Netflix Recommender Framework

Introduction
------
I believe that sharing tools and ideas will help everyone.  For me this contest is about finding an algorithm, not how to spend a week figuring out how to mmap files in C.  There have been many posts here asking how to deal with that.  One by one, it looks like many of you have already spent time and effort making a basic framework (many seemed to know C++, so that is what I have written this in).  I spent the first week trying as hard as I could not to look at this problem as a programing task, but as a more interesting, fun, math problem.  I hacked up several ideas in scripting languages the first week.  It ran slow, but it was fast enough to "code" (and easy enough to toss out) for me to spend my time worrying about the algorithm and not the data.  Reading the boards here and blogs elsewhere I saw a lot of people looking for answers on how to parse the data and mostly just fit it all into memory.  The horrible part was that by the time they figured it out they were not excited to work on the real problem!  So I took aside a week and wrote a small framework that anyone can use if they want (or at the bare minimum an example of one possible way to do it for people to read).  The framework will generate binary files the first time they are run so there is (almost) no startup delay from then on.  There are two binary blobs, the first is a list of all the movies (~400MB) and the second is a list of users (~200MB).  The average algorithm takes around 1/2 a second for the entire probe set to run.  You are of course free to change or improve any of this to fit your algorithms.  The package includes example implementations that you can immediately open in your favorite editor and start modifying.  And last but not least there are a few tools that you can use on the database.  I do not include my current algorithm, but as I find different algorithms that do not work I wont toss them out, but will be including them with future releases for everyone to learn from.  All the code is licensed under the BSD license.

Enjoy and good luck!

Benjamin Meyer

P.S. If you win I am in need of a new laptop smile




Netflix Recommender Framework
http://www.icefox.net/programs/?program … rFramework
-------

A small C++ framework that lets you hopefully think about the algorithm and not how to fit the database in your memory.  For more information about the netflix prize visit: http://www.netflixprize.com/

If you find this useful I would appreciate getting a book off my Amazon wish list (doesn't need to be new, used is fine by me) http://www.amazon.com/o/registry/FYDMD50IC8E3

Quick Start Instructions
------
Move this into the download/ directory after-which it should look like this

download/qualifying.txt
download/probe.txt
download/training_set/
download/netflixrecommenderframework/

Run "qmake -recursive" to generate the Makefiles and then Go into the algorithms/average directory and run make (On debian you need to run qmake-qt4, qmake points to qmake-qt3).

cd download/netflixresultsgenerator
qmake -recursive
cd algorithms/average
make

It makes a binary file called "average".

The first time it runs it will read in the entire training data and create a several binary files that is then mmap'd in future runs for the training set and probe data.  On a 32bit system the training set will be 384MB.   The average example implements the average algorithm.  Once the database is created it should take about five seconds when in debug mode and half a second in release for the entire probe data set.

Now that you have seen it working open up main.cpp in your favorite editor and modify the determine() function with your algorithm idea.

If an algorithm you try isn't good enough and you are going to toss it out, if you e-mail it to me I will include it in future releases for everyone to learn from.

There is a doubleaverage algorithm for you to look at that show some usage of the User class.

Included Tools
------

scrubprobedata will remove the probe data from the training files and generate a new probe file containing the answers.

explorer is a small GUI application that lets you explore the data.

movie pass it a movie id and it will output all the votes for the movie in "user,vote" format

user pass it a user id and it will output all the votes by that user in "movie,vote" format

The Code
-------

The cross platform toolkit Qt version 4 is used for the build system, test system, and basic file operations.  This allows for the code to be used on Windows, OS X, and Linux.  You can download the GPL version of their library at http://www.trolltech.com/ if you do not already have it on your system.  Feel free to use the existing build system or replace it with your own preferred.

This code is licensed under the BSD license.

The code for the framework is located in the src/ directory.  There are autotests for the included classes in the autotests directory.  There are five main classes in the framework.

DataBase - creates, loads, and holds the pointers the mmaped arrays
Movie - Class for working with a movie, how many votes it has, etc
User -  Class for working with a single user, how many votes it made, etc
Probe - Class that loads the probe data and runs it through the Algorithm
RMSE - Class to keep track of the current rmse score (used by the probe)

BUGS
------
If you find bugs and report them please include a modified autotests file that can reproduce the bug.

I have done my development in Linux and OS X so please report any Windows compiler errors you get.

---
Explorer GUI tool screenshot:
http://www.icefox.net/blogs/explorer.png

Last edited by icefox (2006-10-28 06:56:46)

Offline

 

#2 2006-10-27 18:47:40

Algogene
Member
Registered: 2006-10-23
Posts: 4
Website

Re: Netflix Recommender Framework

Neato!  Sounds like this will be helpful to many, thanks for sharing.

big_smile


o/

Offline

 

#3 2006-10-27 20:06:36

RudeDude
Member
Registered: 2006-10-16
Posts: 38

Re: Netflix Recommender Framework

You rock. I'll certainly be contributing some code and eventually some sort of donation to you smile

Offline

 

#4 2006-10-27 20:57:31

voidanswer
Member
Registered: 2006-10-10
Posts: 99

Re: Netflix Recommender Framework

very helpful, thanks.

edit:

now we just need to throw some distributed computing in the mix smile

Last edited by voidanswer (2006-10-27 21:18:28)

Offline

 

#5 2006-10-27 21:21:01

RudeDude
Member
Registered: 2006-10-16
Posts: 38

Re: Netflix Recommender Framework

..\..\src\database.cpp:36:22: sys/mman.h: No such file or directory

Hmmm, I guess this isn't compiling as easily as I thought.

* Note, I am trying to compile and run on Windows XP.

Last edited by RudeDude (2006-10-27 21:32:29)

Offline

 

#6 2006-10-28 06:37:28

icefox
Member
From: Oslo, Norway
Registered: 2006-10-20
Posts: 73
Website

Re: Netflix Recommender Framework

RudeDude wrote:

..\..\src\database.cpp:36:22: sys/mman.h: No such file or directory

Hmmm, I guess this isn't compiling as easily as I thought.

* Note, I am trying to compile and run on Windows XP.

Hit the first page that google gave me and found this:
http://www.genesys-e.de/jwalter/mix4win.htm  Let me know if it works and I will add it in.

Q_OS_WIN is already defined so you can do

#ifdef Q_OS_WIN
... copy the above stuff in the link
#else
#include <sys/mman.h>
#endif

Offline

 

#7 2006-10-28 06:44:06

jml
Member
From: Southern California
Registered: 2006-10-10
Posts: 45

Re: Netflix Recommender Framework

..\..\src\database.cpp:36:22: sys/mman.h: No such file or directory

Hmmm, I guess this isn't compiling as easily as I thought.

* Note, I am trying to compile and run on Windows XP.

Quick and dirty way: grab winmmap.{c,h} from here and add the appropriate include and such.

This framework looks pretty nice. Cleaner than my code, at least wink

Offline

 

#8 2006-10-28 07:17:09

RudeDude
Member
Registered: 2006-10-16
Posts: 38

Re: Netflix Recommender Framework

Thanks, I did a bunch of searching last night and didn't find these two links.
I'm trying the emulation code from imagemagik now and having some trouble so I might have to read the "Emulating Unix Memory". Or it's somethign wrong with the resultant Make files I'm not sure.

Offline

 

#9 2006-10-28 07:28:08

icefox
Member
From: Oslo, Norway
Registered: 2006-10-20
Posts: 73
Website

Re: Netflix Recommender Framework

after including the files and movifing the includes add the following to netflixresultsgenerator.pri and then re-run qmake-qt4

win32 {
    SOURCES += $$PWD/winmmap.c
    HEADERS += $$PWD/winmmap.h
}

Offline

 

#10 2006-10-28 07:36:45

RudeDude
Member
Registered: 2006-10-16
Posts: 38

Re: Netflix Recommender Framework

I did change the PRI file when I realized it wouldn't make it into the MAKE file otherwise. winmmap.c is compiling into the .o file but it looks like I'm getting linker problems in the end.

Code:

gcc -c -g -g -Wall -DUNICODE -DQT_LARGEFILE_SUPPORT -DQT_DLL -DQT_CORE_LIB -DQT_THREAD_SUPPORT -DQT_
NEEDS_QMAIN -I"C:/Dev/Qt/4.2.1/include/QtCore" -I"C:/Dev/Qt/4.2.1/include/QtCore" -I"C:/Dev/Qt/4.2.1
/include" -I"." -I"C:/djr7m/netflix/download/netflixrecommenderframework/src" -I"C:/Dev/Qt/4.2.1/inc
lude/ActiveQt" -I"debug" -I"." -I"c:\Dev\Qt\4.2.1\mkspecs\win32-g++" -o debug\winmmap.o ..\..\src\wi
nmmap.c


g++ -mthreads -Wl,-enable-stdcall-fixup -Wl,-enable-auto-import -Wl,-enable-runtime-pseudo-reloc -Wl
,-subsystem,windows -o "debug\average.exe" debug\winmmap.o debug\database.o debug\movie.o debug\prob
e.o debug\user.o debug\main.o  -L"c:\Dev\Qt\4.2.1\lib" -lmingw32 -lqtmaind -lQtCored4

debug\database.o(.text+0xc62): In function `ZN8DataBaseD2Ev':
C:/djr7m/netflix/download/netflixrecommenderframework/algorithms/average/../../src/database.cpp:64:
undefined reference to `munmap(void*, unsigned int)'

Offline

 

#11 2006-10-28 11:12:25

RudeDude
Member
Registered: 2006-10-16
Posts: 38

Re: Netflix Recommender Framework

It looks like I needed to name my "winmmap.c" as "winmmap.cpp" to get the .o file in the right format or something??  Well average.exe is now crunching so we'll see what happens.
--- later ---
I'm getting movie.dat generated but something crashes after calculations run for a while on the movie.index (which ends up at 69.4KB and no more). I'm going to try and debug.

Last edited by RudeDude (2006-10-28 14:32:54)

Offline

 

#12 2006-10-31 00:30:19

CS1
Member
From: San Jose, CA
Registered: 2006-10-02
Posts: 151

Re: Netflix Recommender Framework

Excellent contributions.  I think you're an example of a special category of member, perhaps we say "Mensch", for those who make significant contributions to the community.

By the way, on your Amazon wishlist, I recommend dropping the Norvig & Russell book and replace it with Hastie, Tibshirani, and Friedman.

Cheers,

CS

Offline

 

#13 2006-10-31 02:22:40

asafdav2
Member
Registered: 2006-10-26
Posts: 9

Re: Netflix Recommender Framework

thanks alot for this
if anyone convertes it to pure c++ (no qt) and feel like sharing it'll help greatly - i'm having lots of problems with qt

Offline

 

#14 2006-10-31 14:28:49

icefox
Member
From: Oslo, Norway
Registered: 2006-10-20
Posts: 73
Website

Re: Netflix Recommender Framework

CS1 wrote:

Excellent contributions.  I think you're an example of a special category of member, perhaps we say "Mensch", for those who make significant contributions to the community.

By the way, on your Amazon wishlist, I recommend dropping the Norvig & Russell book and replace it with Hastie, Tibshirani, and Friedman.

Cheers,

CS

Thanks for the recommendation, it looks good and I have added it to my list.  The Norvig & Russel book, did you find it covered a lot of the same material or just a not very good book?

Offline

 

#15 2006-10-31 14:32:41

icefox
Member
From: Oslo, Norway
Registered: 2006-10-20
Posts: 73
Website

Re: Netflix Recommender Framework

asafdav2 wrote:

thanks alot for this
if anyone convertes it to pure c++ (no qt) and feel like sharing it'll help greatly - i'm having lots of problems with qt

What problems are you having with Qt?  Are you using Qt version 4?  If you use version 3 it will probably not work.  On the Linux side even Debian ships qt4 packages so you shouldn't have any problems and on the mac there are nice install package and on windows the same thing, their installer even installed mingw for me (don't own a copy of visual C++ at home).  Note that on Debian at least the qmake that you need to run is qmake-qt4, qmake by default is aliased to qmake-qt3.

Qt provides a nice framework for cross platform development.  I have an Apple laptop, a Linux desktop and can dual boot into Windows.  I can work on my software and use it in all three (granted I don't spend much time in Windows these days).  For me the fact that they will provide all of this in a GPL toolkit is fantastic as it lets me spend my time not figuring out each build system or quirks with vs6's compiler, but hacking on my projects.

Last edited by icefox (2006-10-31 14:57:53)

Offline

 

#16 2006-11-01 06:34:06

asafdav2
Member
Registered: 2006-10-26
Posts: 9

Re: Netflix Recommender Framework

i'm trying to compile under debian. when running
qmake - recursive
i'm getting the error msg -
QMAKESPEC has not been set, so configuration cannot be deduced.

when trying -
qmake-qt4
i'm getting -
Command not found.

and when trying
qmake -qt4
i'm getting -
***Unknown option -qt4

Last edited by asafdav2 (2006-11-01 06:37:02)

Offline

 

#17 2006-11-01 06:44:41

RudeDude
Member
Registered: 2006-10-16
Posts: 38

Re: Netflix Recommender Framework

So what version of Qt is installed then? You want your CWD to be the "algorithms/average" directory when you run the qmake recursive.

QMAKESPEC is simply an environmental variable that indicates if your in windows or unix etc. I believe you can get around the variable by using the option "-unix", so try "qmake -unix -recursive" or just define the variable (I believe you need it to be "linux-g++" )

"QMAKESPEC=linux-g++ qmake -recurive"

I hope this helps.

Offline

 

#18 2006-11-01 06:53:53

asafdav2
Member
Registered: 2006-10-26
Posts: 9

Re: Netflix Recommender Framework

thanks for the reply
looks like it doesnt help -

qmake -unix -recursive
gives the same error msg

set QMAKESPEC=linux-g++
qmake -recursive

same error msg again

i'm not really sure what version of qt is installed (i'm using debian in my university), how do you see it ?

Last edited by asafdav2 (2006-11-01 06:55:34)

Offline

 

#19 2006-11-01 06:58:29

RudeDude
Member
Registered: 2006-10-16
Posts: 38

Re: Netflix Recommender Framework

So, what version of Qt?
Depending on the shell you are using the "set" command may not make the variable appear globally which is why I had put it on one line with qmake.
In windows Qt actually creates a "shell launcher" that sets environmental variables and library search paths. Perhaps the Qt installer in debian did something similar?

Sorry, but I'm out of ideas, I don't know Qt or qmake so well. You could always digress to manually calling g++ or making your own Makefile since there aren't too many files to compile.

Offline

 

#20 2006-11-01 11:36:00

icefox
Member
From: Oslo, Norway
Registered: 2006-10-20
Posts: 73
Website

Re: Netflix Recommender Framework

asafdav2 wrote:

i'm trying to compile under debian. when running
qmake - recursive
i'm getting the error msg -
QMAKESPEC has not been set, so configuration cannot be deduced.

when trying -
qmake-qt4
i'm getting -
Command not found.

and when trying
qmake -qt4
i'm getting -
***Unknown option -qt4

not "qmake" with an argument "-qt4", but the executable "qmake-qt4".  On Debian the qt4 package that contains qmake-qt4 is qt4-dev-tools I believe.

Offline

 

#21 2006-11-02 01:53:54

asafdav2
Member
Registered: 2006-10-26
Posts: 9

Re: Netflix Recommender Framework

well as i said qmake-qt4 gives 'unknown command' error, but seems like qmake-qt3 works and even produces a Makefile. i guess i can deduce this is what we have installed. anyway when running make i'm getting a few error msgs with the first one saying that it cant locate probe.h (maybe because i didnt use qmake-qt3 -recursive? i tried but it said he doesnt know this option, or maybe because i cant compile it using qt3 - in that case i guess i'll have to install linux at home)

Offline

 

#22 2006-11-02 02:27:43

jml
Member
From: Southern California
Registered: 2006-10-10
Posts: 45

Re: Netflix Recommender Framework

There are big differences between Qt3 and Qt4, so you do need to install Qt4 to compile and run. Looks like on Debian you need the package libqt4-dev (not dev tools) to get qmake-qt4.

Last edited by jml (2006-11-02 02:32:13)

Offline

 

#23 2006-11-02 07:06:32

mlearing
Member
Registered: 2006-11-02
Posts: 7

Re: Netflix Recommender Framework

In "void DataBase::generateUserDatabase()",
why your sort based on votes?  qSort(userVotes.begin(), userVotes.end(), votelessthan)

How could I do not sort,  just save  in the order of moveid?

Thanks a lot,

Offline

 

#24 2006-11-02 07:27:32

icefox
Member
From: Oslo, Norway
Registered: 2006-10-20
Posts: 73
Website

Re: Netflix Recommender Framework

mlearing wrote:

In "void DataBase::generateUserDatabase()",
why your sort based on votes?  qSort(userVotes.begin(), userVotes.end(), votelessthan)

How could I do not sort,  just save  in the order of moveid?

Thanks a lot,

The user database is compressed, I implemented something similar to what was already mentioned by in5ane here: http://www.netflixprize.com/community/v … php?id=333 

It lists the number of movies that are all ranked a 5 and then lists them, then lists the movies that are ranked a 4 and then lists them etc. (See above post for more details)  This lets all the data fit into less memory and is very handy for those algorithms that want all the 5 votes quickly etc.  It is this limitation that causes it to be in this order.  It is debatable if the memory savings is worth it as you have to unpack the data.  I have optimized the User class (and will release it this weekend), but it is still slower then direct array access, of course.  It can be changed to be like the movie database and just be an uint array of movies and votes (or maybe even a database option?).

Right now my algorithms mostly deal with movies and not users so I haven't spent too much time on the Users.  Feel free to come up with a different storage for users and I'll include it in the next release.

Offline

 

#25 2006-11-03 07:55:45

shauno
Member
Registered: 2006-11-02
Posts: 4

Re: Netflix Recommender Framework

This is fantastic - just what I needed to get off the RDBMS, which was great for generating ideas, but Slooowww (on my laptop)

Has anyone gotten this to work on a Windows machine?  I am able to compile it, generate the initial "cache" database, but I error out during the creation of the cleverly compressed User database.  I notice the process hits about 1.1GB of memory before crashing.

Could  this be a Windows problem?  the drwtsn error log didn't help me, but all those movs, pushes and pops...in over my head.

Thanks

Offline

 

Board footer

Powered by PunBB
© Copyright 2002–2005 Rickard Andersson