To predict red wine quality. I have 6 classes, from 3 to 8. Dataset is unbalanced, with a lot of classes 5 and 6.
My model performs really bad, with a low accuracy, and a loss that does not decrease under 1.
How can I improve this model?
I tried changing #layers and #neurons, also the batch_size and the learning rate. Any suggestion, also related to hyperparameters optimization?
I don’t know if you opened the notebook I linked before. However, I added accuracy, that is about 0.3-0.5
Do you think that changing the optimizer, using Adam, or momentum, or using learning rate decay, could be useful?
Or, do you think is it ok to balance the dataset using SMOTE or some undersampling technique?
DataLoader class has an optional keyword argument sampler which can be used to change the way you draw examples from dataset. In this case I’m simply cycling over each class and picking one example from each. This way the batches contain around equal number of examples from each class.
I can add you as collaborator on kaggle if you wan’t to see how to use it. Gonna need kaggle username ofc.
I used the class weights in order to perform better, but probably i did something wrong. I did in this way:
I used the formula: weight of class c =(size of largest class)/(size of class c)
Quality 3 = 3, Weight for loss (highest counter/class counter)= 134.66666666666666
Quality 4 = 28, Weight for loss (highest counter/class counter)= 14.428571428571429
Quality 5 = 404, Weight for loss (highest counter/class counter)= 1.0
Quality 6 = 398, Weight for loss (highest counter/class counter)= 1.015075376884422
Quality 7 = 104, Weight for loss (highest counter/class counter)= 3.8846153846153846
Quality 8 = 12, Weight for loss (highest counter/class counter)= 33.666666666666664
Then I created this tensor: weights = torch.tensor([134.66666666666666, 14.428571428571429, 1.0, 1.015075376884422, 3.8846153846153846, 33.666666666666664]) and I passed to the loss function as argument
However, I obtained results worse than the case in which I don’t use these weights. I did in the correct way, or I made some mistake?
Here is the full code (the case with weights is at the end of the notebook):
Well, the model now gets a bit “penalized” for finding vague “match-all” solution, so the loss is not as good as it was.
You have only 3 examples of your “quality 3” class. You really need new examples here
Did you try augmenting the dataset with artificial examples?
BTW, I think you should look into models that deal with fraud detection. There, many transactions are ok, but only a small percentage of them is fraudulent. Since it’s serious problem I guess there are some solutions to get good accuracy without constantly saying “it’s ok”.
I know that datasets of fraud detection.
However, I tried using SMOTE from the library imblearn, but I was not able to apply it on tensors. I think you understood that I am new in Pytorch In Python I didn’t have these types of problems
I’ve just read about a simple method - you get two examples of your imbalanced class and interpolate between them (since they’re just “vectors” it’s simple). I’m gonna try it and see if it works or not (well, never played with such imbalanced dataset, so it’s kinda beneficial for me too to test this).
Yeah, I know this solution of bad/medium/good quality. I did last year for a project in R, but now I want to do different things
I don’t know how to do SMOTE in Pytorch, so I think I will do SMOTE in Microsoft Azure and then use that dataset_smote to do the training. Also because I have no more time Tomorrow I have the exam!!
Thank you very much for all your help!