I am unsure how to interpret the default behavior of Keras in the following situation:
My Y (ground truth) was set up using scikit-learn's MultilabelBinarizer().
Therefore, to give a random example, one row of my y column is one-hot encoded as such: [0,0,0,1,0,1,0,0,0,0,1].
So I have 11 classes that could be predicted, and more than one can be true; hence the multilabel nature of the problem. There are three labels for this particular sample.
I train the model as I would for a non multilabel problem (business as usual) and I get no errors.
from keras.models import Sequential from keras.layers import Dense, Dropout, Activation from keras.optimizers import SGD  model = Sequential() model.add(Dense(5000, activation='relu', input_dim=X_train.shape[1])) model.add(Dropout(0.1)) model.add(Dense(600, activation='relu')) model.add(Dropout(0.1)) model.add(Dense(y_train.shape[1], activation='softmax'))  sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True) model.compile(loss='categorical_crossentropy',               optimizer=sgd,               metrics=['accuracy',])  model.fit(X_train, y_train,epochs=5,batch_size=2000)  score = model.evaluate(X_test, y_test, batch_size=2000) score   What does Keras do when it encounters my y_train and sees that it is "multi" one-hot encoded, meaning there is more than one 'one' present in each row of y_train?  Basically, does Keras automatically perform multilabel classification?  Any differences in the interpretation of the scoring metrics?
There are two main methods for tackling a multi-label classification problem: problem transformation methods and algorithm adaptation methods. Problem transformation methods transform the multi-label problem into a set of binary classification problems, which can then be handled using single-class classifiers.
We use the sigmoid activation function on the final layer . Sigmoid converts each score of the final node between 0 to 1 independent of what the other scores are. If the score for some class is more than 0.5, the data is classified into that class.
Don't use softmax.
Use sigmoid for activation of your output layer.
Use binary_crossentropy for loss function.
Use predict for evaluation.
In softmax when increasing score for one label, all others are lowered (it's a probability distribution). You don't want that when you have multiple labels.
from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, Dropout, Activation from tensorflow.keras.optimizers import SGD  model = Sequential() model.add(Dense(5000, activation='relu', input_dim=X_train.shape[1])) model.add(Dropout(0.1)) model.add(Dense(600, activation='relu')) model.add(Dropout(0.1)) model.add(Dense(y_train.shape[1], activation='sigmoid'))  sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True) model.compile(loss='binary_crossentropy',               optimizer=sgd)  model.fit(X_train, y_train, epochs=5, batch_size=2000)  preds = model.predict(X_test) preds[preds>=0.5] = 1 preds[preds<0.5] = 0 # score = compare preds and y_test 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With