Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MemoryError in keras.utils.np_utils.to_categorical

Tags:

python

keras

I have a dataset where the output is one of 46226 categories. I also have millions of samples.

But it seems that Keras/TensorFlow require one-hot encodings of the output.

Problem is, np_utils.to_categorical(y_indices,num_classes) causes an out-of-memory error because then I need a 8000 x 46226 matrix.

My working PC has a 8G Memory,when I try to execute 'numpy.zeros((8000,46226))',it works fine.But when I change my y_indices to one-hot encodings,it got the following error:

    ------------------------------------------------------------------------
    MemoryError                            Traceback (most recent call last)
    <ipython-input-9-7b9df1cf8cee> in <module>()
    ----> 1 Y_cat = to_categorical(Y, num_classes=nb_classes)

    c:\program files\anaconda3\envs\python35\lib\site-packages\keras\utils\np_utils.py in to_categorical(y, num_classes)
         22     num_classes = np.max(y) + 1
         23     n = y.shape[0]
    ---> 24     categorical = np.zeros((n, num_classes))
         25     categorical[np.arange(n), y] = 1
         26     return categorical

    MemoryError: 

Is there any way to get Keras to solve this hinder? I would be happy to add some code if someone would point out how to best do it.

like image 313
Hailin FU Avatar asked Sep 19 '17 06:09

Hailin FU


1 Answers

You do not actually need one-hot encoded labels, you can use integer labels with the sparse_categorical_crossentropy loss, which accepts integer labels.

This way there should not be an out of memory error. Another alternative is to make a generator (to use with fit_generator) and one-hot encode labels on the fly.

like image 100
Dr. Snoopy Avatar answered Nov 14 '22 22:11

Dr. Snoopy