Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CNN model is overfitting to data after reaching 50% accuracy

I am trying to identify 3 (classes) mental states based on EEG connectome data. The shape of the data is 99x1x34x34x50x130 (originally graph data, but now represented as a matrix), with respectably represent [subjects, channel, height, width, freq, time series]. For the sake of this study, can only input a 1x34x34 image of the connectome data. From previous studies, it was found that the alpha band (8-1 hz) had given the most information, thus the dataset was narrowed down to 99x1x34x34x4x130. The testing set accuracy on pervious machine learning techniques such as SVMs reached a testing accuracy of ~75%. Hence, by goal is to achieve a greater accuracy given the same data (1x34x34). Since my data is very limited 1-66 for training and 66-99 for testing (fixed ratios and have a 1/3 class distribution), I thought of splitting the data along the time series axis (6th axis) and then averaging the data to a shape of 1x34x34 (from ex. 1x34x34x4x10, 10 is the random sample of time series). This gave me ~1500 samples for training, and 33 for testing (testing is fixed, the class distributions are 1/3).

Model:

SimpleCNN(
  (conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))    
  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (drop1): Dropout(p=0.25, inplace=False)
  (fc1): Linear(in_features=9248, out_features=128, bias=True)
  (drop2): Dropout(p=0.5, inplace=False)
  (fc2): Linear(in_features=128, out_features=3, bias=True)
)
CrossEntropyLoss()
Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 5e-06
    weight_decay: 0.0001
)

Results: The training set can achieve an accuracy of 100% with enough iteration, but at the cost of the testing set accuracy. After around 20-50 epochs of testing, the model starts to overfit to the training set and the test set accuracy starts to decrease (same with loss).

enter image description here

What I have tried: I have tried tuning the hyperparameters: lr=.001-000001, weight decay=0.0001-0.00001. Training to 1000 epochs (useless bc overfitting in less than 100 epochs). I have also tried increasing/decreasing the model complexity with adding adding addition fc layers and varying amount of channels in CNN layers form 8-64. I have also tried adding more CNN layers and the model did a bit worse averaging around an accuracy of ~45% on the test set. I tried manually scheduling the learning rate every 10 epochs, the results were the same. Weight decay didn’t seem to effect the results much, changed it from 0.1-0.000001.

From previous testing, I have a model that achieves 60% acc on both the testing and the training set. However, when I try to retrain it, the acc instantly goes down to ~40 on both sets (training and testing), which makes no sense. I have tried altering the learning rate from 0.01 to 0.00000001, and also tried weight decay for this.

From training the model and the graphs, it seems like the model dosn’t know what it’s doing for the first 5-10 epochs and then starts to learn rapidly to around ~50%-60% acc on both sets. This is where the model starts to overfit, form there the model’s acc increases to 100% on the training set, and the acc for the testing set goes down to 33%, which is equivalent to guessing.

Any tips?

Edit:

The model’s outputs for the test set are very very close to each other.

0.33960407972335815, 0.311821848154068, 0.34857410192489624

The average standard deviation for the whole test set between predictions for each image are (softmax):

0.017695341517654846

However, the average std for the training set is .22 so...

F1 Scores:

Micro Average: 0.6060606060606061
Macro Average: 0.5810185185185186
Weighted Average: 0.5810185185185186
Scores for each class: 0.6875 0.5 0.55555556

Here is a confusion matrix: enter image description here

like image 885
Aditya Kendre Avatar asked Oct 15 '22 00:10

Aditya Kendre


1 Answers

I have some suggestions, what I would try, maybe you've already done it:

  • increase the probability of dropout, that could decrease overfitting,
  • I did not see or I missed it but if you don't do it, shuffle all the samples,
  • there is not so much data, did you thought about using other NN to generate more data of the classes which are having the least score? I am not sure if it is the case here but even randomly rotating, scaling the images can produce more training examples,
  • another approach you can take, if you haven't done it already, use transfer learning using another popular CNN net and see how it is doing the job, then you can have some comparison, whether it is something wrong with your architecture or it's lack of examples :) I know these are just suggestions but maybe, if you haven't try some of them, they will bring you closer to the solution. Good luck!
like image 106
Maciek Woźniak Avatar answered Oct 19 '22 03:10

Maciek Woźniak