Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to automatically infer the class_weight from flow_from_directory in Keras?

I have an imbalanced multi-class dataset and I want to use the class_weight argument from fit_generator to give weights to the classes according to the number of images of each class. I'm using ImageDataGenerator.flow_from_directory to load the dataset from a directory.

Is it possible to directly infer the class_weight argument from the ImageDataGenerator object?

like image 590
Fábio Perez Avatar asked Mar 03 '17 18:03

Fábio Perez


People also ask

What is Class_weight in keras?

In Keras, class_weight parameter in the fit() is commonly used to adjust such setting. You can also use the following format, class_weight = {0: 1., 1: 50., 2: 2.} In the above statement, every one instance of class 1 would be equivalent of 50 instances of class 0 & 25 instances of class 2.

How do you calculate class weights?

Generating class weights In binary classification, class weights could be represented just by calculating the frequency of the positive and negative class and then inverting it so that when multiplied to the class loss, the underrepresented class has a much higher error than the majority class.

How do class weights work?

Class weights give all the classes equal importance on gradient updates, on average, regardless of how many samples we have from each class in the training data. This prevents models from predicting the more frequent class more often just because it's more common.


1 Answers

Just figured out a way of achieving this.

from collections import Counter
train_datagen = ImageDataGenerator()
train_generator = train_datagen.flow_from_directory(...)

counter = Counter(train_generator.classes)                          
max_val = float(max(counter.values()))       
class_weights = {class_id : max_val/num_images for class_id, num_images in counter.items()}                     

model.fit_generator(...,
                    class_weight=class_weights)

train_generator.classes is a list of classes for each image. Counter(train_generator.classes) creates a counter of the number of images in each class.

Note that these weights may not be good for convergence, but you can use it as a base for other type of weighting based on occurrence.

This answer was inspired by: https://github.com/fchollet/keras/issues/1875#issuecomment-273752868

like image 197
Fábio Perez Avatar answered Oct 05 '22 07:10

Fábio Perez