Multi-label imbalanced train test split [closed]

Question

I have a data with four regression labels. Samples for each regression labels are imbalanced. The data is attached here with the post data_multi_label_reg.csv.

It has 5 columns, out of which 4 i.e A, B, C, and D are for regression labels sample is for sample or training example in the data.

Each sample is defined for one of the four labels only. Therefore, each sample carries one label value and rest are empty.

Also, the labels are highly imbalanced. For instance, D is defined for most of the samples while A is defined for least samples.

Is there any python package which can divide this data set into train_test_split such that in either of the train and test split, the ratio of each label is retained as in the original data set.

There is sklearn function as follows.

x_train, x_test, y_train, y_test = train_test_split(x, y,
                                                    test_size=0.33,
                                                    random_state=0,
                                                    stratify=y)

But this seems to be working with single label output. Is there any similar function for multi-label regression output?

lezaf · Accepted Answer

You could take a look at scikit-multilearn library. There is the iterative_train_test_split module. Check out this simple usage example and this doc.

Multi-label imbalanced train test split [closed]

Tags:

python

pandas

machine-learning

numpy

scikit-learn

Abdul Karim Khan

1 Answers

lezaf

Recent Activity

Donate For Us

Multi-label imbalanced train test split [closed]

Tags:

python

pandas

machine-learning

numpy

scikit-learn

Abdul Karim Khan

1 Answers

lezaf

Related questions

Recent Activity

Donate For Us