I'm doing a binary classification with CNNs and the data is imbalanced where the positive medical image : negative medical image = 0.4 : 0.6. So I want to use SMOTE to oversample the positive medical image data before training. However, the dimension of the data is 4D (761,64,64,3) which cause the error
Found array with dim 4. Estimator expected <= 2
So, I reshape my train_data:
X_res, y_res = smote.fit_sample(X_train.reshape(X_train.shape[0], -1), y_train.ravel())
And it works fine. Before feed it to CNNs, I reshape it back by:
X_res = X_res.reshape(X_res.shape[0], 64, 64, 3)
Now, I'm not sure is it a correct way to oversample and will the reshape operator change the images' structer?
It is used to obtain a synthetically class-balanced or nearly class-balanced training set, which is then used to train the classifier. SMOTE actually performs better than simple oversampling, but although it is not quite popular with images as much as its popularity when dealing with structured data.
SMOTE: a powerful solution for imbalanced data SMOTE stands for Synthetic Minority Oversampling Technique. The method was proposed in a 2002 paper in the Journal of Artificial Intelligence Research. SMOTE is an improved method of dealing with imbalanced data in classification problems.
One of the basic approaches to deal with the imbalanced datasets is to do data augmentation and re-sampling. There are two types of re-sampling such as under-sampling when we removing the data from the majority class and over-sampling when we adding repetitive data to the minority class.
What happens under the hood is a 5-fold CV meaning the X_train is again split in 80:20 for five times where 20% of the data set is where SMOTE isn’t applied. This is my understanding. Hi ! SMOTE works for imbalanced image datasets too ? No, it is designed for tabular data. You might be able to use image augmentation in the same manner.
Choose a Career Track Мaster the skills for the specific job role you want - Data Scientist, Data Analyst, or Business… So, what is SMOTE? SMOTE or Synthetic Minority Oversampling Technique is an oversampling technique but SMOTE working differently than your typical oversampling.
The pipeline is fit and then the pipeline can be used to make predictions on new data. Yes, call pipeline.predict () to ensure the data is prepared correctly prior to being passed to the model. Hi Jason, SMOTE sampling is done before / after data cleaning or pre-processing or feature engineering???
You can apply SMOTE directly fir multi-class, or you can specify the preferred balance of the classes to SMOTE. Thanks for sharing Jason. In imblearn.pipeline the predict method says tahar it applies transforms AND sampling and then the final predict of the estimator.
I had a similar issue. I had used the reshape function to reshape the image (basically flattened the image)
X_train.shape
(8000, 250, 250, 3)
ReX_train = X_train.reshape(8000, 250 * 250 * 3)
ReX_train.shape
(8000, 187500)
smt = SMOTE()
Xs_train, ys_train = smt.fit_sample(ReX_train, y_train)
Although, this approach is pathetically slow, but helped to improve the performance.
from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state=42)
train_rows=len(X_train)
X_train = X_train.reshape(train_rows,-1)
(80,30000)
X_train, y_train = sm.fit_resample(X_train, y_train)
X_train = X_train.reshape(-1,100,100,3)
(>80,100,100,3)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With