ToDo

Submitted by Neil Rubens on Tue, 04/08/2008 - 11:16

Correlation

If necessary can speed up correlation for influence AL by Please note, that the equation above can be replaced by an equivalent formula which avoids to use the means and is therefore much faster to calculate:

Additional DBs

See if can run experiments also on NetFlix and Jester

Correctness Verification

In the emperical results sometimes the MAE does not decrease as much as expected, or at later stages actually sometimes increases. Need to verify that there are no bugs in the program and it functions as expected.

1) Examine the correlation measure

2) How the rating is estimated

3) etc.

Neighborhood

Should retrieve neighborhood based only on the rated items by the user (since in real settings don't have access to the test items, and using test items may scew neighborhood).

DB

need to get neighborhood only once in the beginning (users that rated one of the test_items)
dont imply the currently rated items when retrieving hood

as we more items are rated by the user all we need to do is to update correlation for the users that have the new rated item.

DB

We only need to keep the records of the users that are in the neighborhood; all the other users could be deleted from the db. To make it even more efficient should do it in steps (minimizes memory etc).

db:

1) delete users with correlation == 0

2) delete users that did not rate any of the test items

code:

each consequent step takes less time (since db gets smaller)

when only one items has been rated; don't need to get neighborhood and calculate MAE since it is max anyway (since correlation is 0)

May be run with NetFlix too :)

X

At each iteration calculate MAE for all of the items (pick next item randomly)

Profiling

X

May add enhancement that uses only top n neighbors. Might implement by using group by and count (neighbor that has the most items in common should be seleted).

Could select best on two criterions:

1) the users have rated the most items in common

[optional]

2) the user have rated the most test items (if use this criteria for all of the methods, it is a fair one)

X -

unexpected results for Random (MAE does not decrease)

Double check that correlation is calculated correctly

double check MAE calculation etc.

Neil Rubens's blog
Login to post comments