We have a retinal dataset wherein the diseased eye information constitutes 70 percent of the information whereas the non diseased eye constitutes the remaining 30 percent.We want a dataset wherein the diseased as well as the non diseased samples should be equal in number. Is there any function available with the help of which we can do the same?
I would choose to do this with Pandas DataFrame
and numpy.random.choice
. In that way it is easy to do random sampling to produce equally sized data-sets. An example:
import pandas as pd
import numpy as np
data = pd.DataFrame(np.random.randn(7, 4))
data['Healthy'] = [1, 1, 0, 0, 1, 1, 1]
This data has two non-healthy and five healthy samples. To randomly pick two samples from the healthy population you do:
healthy_indices = data[data.Healthy == 1].index
random_indices = np.random.choice(healthy_indices, 2, replace=False)
healthy_sample = data.loc[random_indices]
To automatically pick a subsample of the same size as the non-healthy group you can do:
sample_size = sum(data.Healthy == 0) # Equivalent to len(data[data.Healthy == 0])
random_indices = np.random.choice(healthy_indices, sample_size, replace=False)
You can use the np.random.choice
for a naive under sampling as suggested previously, but an issue can be that some of your random samples are very similar and thus misrepresents the data set.
A better option is to use the imbalanced-learn package that has multiple options for balancing a dataset. A good tutorial and description of these can be found here.
The package lists a few good options for under sampling (from their github):
- Random majority under-sampling with replacement
- Extraction of majority-minority Tomek links
- Under-sampling with Cluster Centroids
- NearMiss-(1 & 2 & 3)
- Condensed Nearest Neighbour
- One-Sided Selection
- Neighboorhood Cleaning Rule
- Edited Nearest Neighbours
- Instance Hardness Threshold
- Repeated Edited Nearest Neighbours
- AllKNN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With