Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate class weights of a Pandas DataFrame for Keras?

I'm trying

print(Y)
print(Y.shape)

class_weights = compute_class_weight('balanced',
                                     np.unique(Y),
                                     Y)
print(class_weights)

But this gives me an error:

ValueError: classes should include all valid labels that can be in y

My Y looks like:

       0  1  2  3  4
0      0  0  1  0  0
1      1  0  0  0  0
2      0  0  0  1  0
3      0  0  1  0  0
...
14992     0  0  1  0  0
14993      0  0  1  0  0

And my Y.shape looks like: (14993, 5)

In my keras model, I want to use the class_weights as it is an uneven distribution:

model.fit(X, Y, epochs=100, shuffle=True, batch_size=1500, class_weights=class_weights, validation_split=0.05, verbose=1, callbacks=[csvLogger])
like image 687
Shamoon Avatar asked Feb 23 '19 13:02

Shamoon


People also ask

How do you calculate class weights?

Generating class weights In binary classification, class weights could be represented just by calculating the frequency of the positive and negative class and then inverting it so that when multiplied to the class loss, the underrepresented class has a much higher error than the majority class.

How do you choose class weight for Imbalanced data?

We will search for weights between 0 to 1. The idea is, if we are giving n as the weight for the minority class, the majority class will get 1-n as the weights. Here, the magnitude of the weights is not very large but the ratio of weights between majority and minority class will be very high.


2 Answers

Just transform the one-hot encoding to categorical labels:

from sklearn.utils import class_weight

y = Y.idxmax(axis=1)

class_weights = class_weight.compute_class_weight('balanced',
                                                  np.unique(y),
                                                  y)

# Convert class_weights to a dictionary to pass it to class_weight in model.fit
class_weights = dict(enumerate(class_weights))
like image 67
Andreas K. Avatar answered Sep 18 '22 10:09

Andreas K.


Create some sample data with at least one example per class

df = pd.DataFrame({
    '0': [0, 1, 0, 0, 0, 0],
    '1': [0, 0, 0, 0, 1, 0], 
    '2': [1, 0, 0, 1, 0, 0],
    '3': [0, 0, 1, 0, 0, 0],
    '4': [0, 0, 0, 0, 0, 1],
})

Stack the columns (convert from wide to long table)

df = df.stack().reset_index()
>>> df.head()

  level_0   level_1     0
0   0       0       0
1   0       1       0
2   0       2       1
3   0       3       0
4   0       4       0

Get the class for each data point

Y = df[df[0] == 1]['level_1']
>>> Y
2     2
5     0
13    3
17    2
21    1
29    4

Compute class weights

class_weights = compute_class_weight(
    'balanced', np.unique(Y), Y
)
>>> print(class_weights)
[1.2 1.2 0.6 1.2 1.2]
like image 22
ulmefors Avatar answered Sep 21 '22 10:09

ulmefors