I have a data with four regression labels. Samples for each regression labels are imbalanced. The data is attached here with the post data_multi_label_reg.csv.
It has 5 columns, out of which 4 i.e A, B, C, and D are for regression labels sample is for sample or training example in the data.
Each sample is defined for one of the four labels only. Therefore, each sample carries one label value and rest are empty.
Also, the labels are highly imbalanced. For instance, D is defined for most of the samples while A is defined for least samples.
Is there any python package which can divide this data set into train_test_split such that in either of the train and test split, the ratio of each label is retained as in the original data set.
There is sklearn function as follows.
x_train, x_test, y_train, y_test = train_test_split(x, y,
test_size=0.33,
random_state=0,
stratify=y)
But this seems to be working with single label output. Is there any similar function for multi-label regression output?
You could take a look at scikit-multilearn library. There is the iterative_train_test_split module. Check out this simple usage example and this doc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With