I have 2 numpy arrays as follows:
images contains the names of image files (images.shape is (N, 3, 128, 128)):
image_1.jpg
image_2.jpg
image_3.jpg
image_4.jpg
labels contains the corresponding labels (0-3) (labels.shape is (N,)):
1
1
3
2
The issue I'm facing is that the classes are imbalanced, with class 3 >> 1 > 2 > 0.
I'd like to balance the final dataset by:
images and labelsSo far I'm using Counter to identify the number of images per class:
from Collections import Counter
import numpy as np
count = Counter(labels)
print(count)
>>>Counter({'1': 2991, '0': 2953, '2': 2510, '3': 2488})
How would you suggest I randomly pop matching elements from images and labels so they contain 2488 samples of classes 0, 1, and 2?
You could use np.random.choice to create an integer-valued mask which you could apply to your labels and images to balance the dataset:
n = 2488
mask = np.hstack([np.random.choice(np.where(labels == l)[0], n, replace=False)
for l in np.unique(labels)])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With