Strange problem in Collaborative filtering

Hey Guy, I got a strange error in the code:
Basically with a range of 1669 rows, the code works. While with 1 more row, the code crash. I uploaded the pictures for more clarity. I’m doing the predictions for the all the 1682 movies’ ratings on user no. 1, because I would like to rank them later to basically suggest the highest rating movies to the user.

I discover that I can visualize only 64 rows.
Also if I put 1670, the code crash.

First: wrong course.

Second: not sure what the learn.model is. And why you learn only users and items → what that means btw? You learn items for every user? I suppose there’s an additional tensor which represents the ratings (it seems to be from the code). If you don’t change it as well, then it won’t work out of the box.

Hi Seb, this is the last part of the video Lesson 6 - Unsupervised Learning and Recommendations, I’m doing right now. link: Lesson 6 - Unsupervised Learning and Recommendations | Jovian , starts at 1:33:00 mark.

I didn’t change the code that much, it is the original notebook of the course video. Basically the goal is to predict the ratings for all the movies/items, on users. In this case I choose to predict all the ratings for all the movies( total 1682 movies in the training set) for the user no. 1.

This the my notebook, btw micmiao/movielens-fastai - Jovian
The last part is where I got the error.

It seems you’re zipping over ratings but you never use them. This means that the for loop iterates only as long as it is possible to create tuple from the iterables given as arguments. Since I suspect ratings to be the shortest one (actually: containing only 64 examples), that’s how many examples you’re able to see.

Also, you never actually show the predictions, because you use first three elements of the zip tuple. The preds are last, so you never really show the predictions.

The interested part starts at 2:05:00, where I tried to accomplish the same task.

Yeah, you’re right. I saw the error. Let me try again.

I removed rating, but the problem of the size of tensor is still there.
If I limit the size to 1669 the code works. But if I increase it to 1670 it crashed like in the pictures.
Since the movies are 1682, I can’t predict for all the movies. Also I don’t understand why it crashes.

Maybe the movie IDs are not continous like someone would expect (perhaps the movie 1670 is missing?). It might be good idea to check if there’s any movie with ID 1670 to confirm that.

I would suggest replacing how you get the movies (no arange) and just pick a list of unique values from ratings_df with unique().

I checked Seb, it is in there. From 1 to 1682. :worried:

Let me try with unique(), for double checking. Result: Nope, it doesn’t work. Gives me the same error, but for all 1682, I can’t run the code line anymore. But the ratings_df[‘Movie ID’].unique() is correct, gives me a len of 1682.

Didn’t mean length. You could have a gap in movie IDs and still have 1682 of them.

Example:
[1, 2, 4, 6, 7, 8]
The list has 6 elements, but some are missing.

Anyway, I’ve checked the files. It seems that the 1670 happens to be rated only once.

My guess is that it got “removed” from the dataset because it became available only for validation.
So the model never had any occasion to learn the vector for this movie (so it’s impossible to pick this vector and use it in u*m calculation). Quick search and I found this.

You would have to play around with data variable and get a possible values of movie IDs from it. Not sure how tho, I’m not used to fastai that much.

You could also drop any movies that happen to appear a number of times below a specific threshold (up to you how big or small). I think 100 would be a good starting point.

Edit:
Seems like you could do data.train_ds.x.classes.values() to get/check the available IDs.

OK Seb. Thanks so much for your help! I really appreciate for your time! I try to read through the link and tweak it again. Hopefully I can find a way to do it. Thanks a lot again!

data.train_ds.x.classes.values(), all the values are there. Damn.

There should be also a list of vectors for any given movie. Is 1670 there?

I’m too novice to fully solve this. I’ll try to study more DL then come back to this.