Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multi-label compute class weight - unhashable type

Working in a multi-label classification problem with 13 possibles outputs in my neural network with Keras, sklearn, etc...

Each output can be an array like [0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1 ,0].

I have an imbalance dataset and i trying to apply compute_class_weight method, like:

class_weight = compute_class_weight('balanced', np.unique(Y_train), Y_train)

When i try to run my code, i got Unhashable Type: 'numpy.ndarray':

Traceback (most recent call last):
  File "main.py", line 115, in <module>
    train(dataset, labels)
  File "main.py", line 66, in train
    class_weight = compute_class_weight('balanced', np.unique(Y_train), Y_train)
  File "/home/python-env/env/lib/python3.6/site-packages/sklearn/utils/class_weight.py", line 41, in compute_class_weight
if set(y) - set(classes):
  TypeError: unhashable type: 'numpy.ndarray'

I know that is because i working with arrays, already tried add some dict,

i.e.:

class_weight_dict = dict(enumerate(np.unique(y_train), class_weight))

Well, i don't know what to do, tried others strategies, but no success... Any ideas?

Thanks in advance!

like image 830
Alex Colombari Avatar asked Oct 16 '22 06:10

Alex Colombari


2 Answers

I encountered a similar problem recently, I am sharing my thinking process.

if your "class imbalance" means some label combinations appear more frequently than others, for example having 10 [0,1,0,0,1] but only 1 [0,1,0,0,0], you can use compute_sample_weight("balanced", Y_train) instead of compute_class_weight(). This function if I am right gives a weight to EACH data in the training dataset. The length of the returned tuple is the length of the training dataset (i.e. number of input data). This sample weight can be added to your training set together with X_train and y_train, as the 3rd argument.

if your "class imbalance" is referring to more negative than positive (more 0s than 1s) in the predicted label -- this condition will give unrealistically high accuracy score in training process, I think the answer by @Prateek above can be a solution, and the weights given by the function is for 0 and 1.

Someone brilliantly construct a code here Multi-label classification with class weights in Keras that answered this problem.

if your "class imbalance" is talking about certain class(es) having more appearance than other classes, for example out of 10 samples 9 of them contains label 2 but only 1 of them contains label 3, I do not know how to solve it using class_weight or sample_weight. Maybe you can try to hard-code and count the number of appearance of each class, and then calculate the weight of each class by the following formula:

# weight_of_class_1 = n_samples/n_class*n_freq_class_1

# n_sample: total number of data
# n_class: number of class
# n_freq_class_1: number of appearance of class 1 in all your labels.

This formula is used in compute_class_weight, but I am not sure whether the calculated weights suit your situation or not.

like image 171
Jasminy Avatar answered Oct 20 '22 09:10

Jasminy


This is happening mostly because your Ytrain is a 2D array instead of 1D. Try :

class_weights = class_weight.compute_class_weight('balanced',
                                                 np.unique(np.ravel(y_train,order='C')),
                                                 np.ravel(y_train,order='C'))
like image 22
Prateek Mehta Avatar answered Oct 20 '22 10:10

Prateek Mehta