One hot encoding of multi label images in keras

Tags:

I am using PASCAL VOC 2012 dataset for image classification. A few images have multiple labels where as a few of them have single labels as shown below.

    0  2007_000027.jpg               {'person'}
    1  2007_000032.jpg  {'aeroplane', 'person'}
    2  2007_000033.jpg            {'aeroplane'}
    3  2007_000039.jpg            {'tvmonitor'}
    4  2007_000042.jpg                {'train'}

I want to do one-hot encoding of these labels to train the model. However, I couldn't use keras.utils.to_categorical, as these labels are not integers and pandas.get_dummies is not giving me the results as expected. get_dummies is giving different categories as below, i.e. it is taking each unique combination of labels as one category.

 {'aeroplane', 'bus', 'car'}  {'aeroplane', 'bus'}  {'tvmonitor', 'sofa'}  {'tvmonitor'} ...

What is the best way to one-hot encode these labels as we don't have specific number of labels for each image.

757

asked Sep 16 '19 05:09

Sree

1 Answers

The MultiLabelBinarizer class allow to do one-hot encoding on multilabel sets, like you have in column b:

print (df)
                 a                        b
0  2007_000027.jpg               {'person'}
1  2007_000032.jpg  {'aeroplane', 'person'}
2  2007_000033.jpg            {'aeroplane'}
3  2007_000039.jpg            {'tvmonitor'}
4  2007_000042.jpg                {'train'}

from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()
df = pd.DataFrame(mlb.fit_transform(df['b']),columns=mlb.classes_)
print (df)
   aeroplane  person  train  tvmonitor
0          0       1      0          0
1          1       1      0          0
2          1       0      0          0
3          0       0      0          1
4          0       0      1          0

Or Series.str.join with Series.str.get_dummies, but it should be slower in large DataFrame:

df = df['b'].str.join('|').str.get_dummies()
print (df)

   aeroplane  person  train  tvmonitor
0          0       1      0          0
1          1       1      0          0
2          1       0      0          0
3          0       0      0          1
4          0       0      1          0

108

answered Oct 23 '22 13:10

jezrael

Related questions
                            
                                Pandas Dataframe How to cut off float decimal points without rounding?
                            
                                Pandas pd.to_datetime only keep time do not date
                            
                                Python Extract a decimal number before a specific substring
                            
                                Pandas - Row number since last greater than 0 value
                            
                                Count occurrences of a list of substrings in a pyspark df column
                            
                                Python Plotly display values'labels
                            
                                What is the type hint for a class reference?
                            
                                What does an empty string key for package_dir do in setup.py?
                            
                                Select rows that match values in multiple columns in pandas
                            
                                How to save csv files faster from pyspark dataframe?
                            
                                Can't install dependencies in docker container
                            
                                How to return image stream and text as JSON response from Python Flask API
                            
                                Why cant I upload my own package to PyPI when my credential are working?
                            
                                How to turn groupby() and value_counts() into multiple pie/bar charts
                            
                                How to perform assert_has_calls for a __getitem__() call?
                            
                                cannot install tensorflow-text using pip despite having tensorflow 2.0.0-beta1 installed
                            
                                How to record mouse and keyboard movement simultaneously with Python?
                            
                                Hypothesis - reuse @given between tests
                            
                                Set background colour for a custom QWidget
                            
                                Airflow can not enter the /admin page after initdb

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

One hot encoding of multi label images in keras

Tags:

python

pandas

keras

one-hot-encoding

multilabel-classification

Sree

People also ask

1 Answers

jezrael

Recent Activity

Donate For Us