Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras GRU NN KeyError when fitting : "not in index"

I'm currently facing an issue while trying to fit my GRU model with my training data. After a quick look on StackOverflow, I found this post to be quite similar to my issue :

Simplest Lstm training with Keras io

My own model is as follow :

nn = Sequential()
nn.add(Embedding(input_size, hidden_size))
nn.add(GRU(hidden_size_2, return_sequences=False))
nn.add(Dropout(0.2))
nn.add(Dense(output_size))
nn.add(Activation('linear'))

nn.compile(loss='mse', optimizer="rmsprop")

history = History()
nn.fit(X_train, y_train, batch_size=30, nb_epoch=200, validation_split=0.1, callbacks=[history])

And the error is :

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-14-e2f199af6e0c> in <module>()
      1 history = History()
----> 2 nn.fit(X_train, y_train, batch_size=30, nb_epoch=200, validation_split=0.1, callbacks=[history])

C:\Users\XXXX\AppData\Local\Continuum\Anaconda\lib\site-packages\keras\models.pyc in fit(self, X, y, batch_size, nb_epoch, verbose, callbacks, validation_split, validation_data, shuffle, show_accuracy, class_weight, sample_weight)
    487                          verbose=verbose, callbacks=callbacks,
    488                          val_f=val_f, val_ins=val_ins,
--> 489                          shuffle=shuffle, metrics=metrics)
    490 
    491     def predict(self, X, batch_size=128, verbose=0):

C:\Users\XXXX\AppData\Local\Continuum\Anaconda\lib\site-packages\keras\models.pyc in _fit(self, f, ins, out_labels, batch_size, nb_epoch, verbose, callbacks, val_f, val_ins, shuffle, metrics)
    199                 batch_ids = index_array[batch_start:batch_end]
    200                 try:
--> 201                     ins_batch = slice_X(ins, batch_ids)
    202                 except TypeError as err:
    203                     raise Exception('TypeError while preparing batch. \

C:\Users\XXXX\AppData\Local\Continuum\Anaconda\lib\site-packages\keras\models.pyc in slice_X(X, start, stop)
     53     if type(X) == list:
     54         if hasattr(start, '__len__'):
---> 55             return [x[start] for x in X]
     56         else:
     57             return [x[start:stop] for x in X]

C:\Users\XXXX\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\frame.pyc in __getitem__(self, key)
   1789         if isinstance(key, (Series, np.ndarray, Index, list)):
   1790             # either boolean or fancy integer index
-> 1791             return self._getitem_array(key)
   1792         elif isinstance(key, DataFrame):
   1793             return self._getitem_frame(key)

C:\Users\XXXX\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\frame.pyc in _getitem_array(self, key)
   1833             return self.take(indexer, axis=0, convert=False)
   1834         else:
-> 1835             indexer = self.ix._convert_to_indexer(key, axis=1)
   1836             return self.take(indexer, axis=1, convert=True)
   1837 

C:\Users\XXXX\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\indexing.pyc in _convert_to_indexer(self, obj, axis, is_setter)
   1110                 mask = check == -1
   1111                 if mask.any():
-> 1112                     raise KeyError('%s not in index' % objarr[mask])
   1113 
   1114                 return _values_from_object(indexer)

KeyError: '[   61 13980 11357  5577 11500 12125 19673 10985  2480  5237  2519 14874\n 16003  2611  3851 10837 11865 14607 10682  5495 10220  5043 23145 11280\n  9547  4766 18323   730  6263] not in index'

Any idea to solve this ? Thanks

EDIT : Some facts about the data :

data_X = pd.read_csv("X.csv")
data_Y = pd.read_csv("Y.csv")

def train_test_split(X,Y, test_size=0.15):  
    #    This just splits data to training and testing parts
    ntrn = int(round(X.shape[0] * (1 - test_size)))
    perms = np.random.permutation(X.shape[0])
    X_train = X.ix[perms[0:ntrn]]
    Y_train = Y.ix[perms[0:ntrn]]
    X_test = X.ix[perms[ntrn:]]
    Y_test = Y.ix[perms[ntrn:]]

    return (X_train, Y_train), (X_test, Y_test) 

X and Y are CSV file containing time series values (e.g. for each row, there are 37 consecutive values of the time series in the X file + 2 time values (considered as past) and 30 in the Y file (considered as the forecast to predict))

print X_train[:1]
print y_train[:1]

          0   1   2   3   4   5   6   7   8    9      ...       29   30   31   32  \
1629  84  76  76  72  72  72  72  87  87  100     ...      165  165  169  169   

       33   34   35   36          37          38  
1629  166  166  185  185  1236778440  1236789240  

[1 rows x 39 columns]
       0    1    2    3    4    5    6    7    8    9  ...    20   21   22  \
1629  195  195  195  195  196  196  194  194  192  192 ...   182  182  164   

       23   24   25   26   27   28   29  
1629  164  146  146  128  128  103  103  

[1 rows x 30 columns]
like image 755
Julian Avatar asked Nov 06 '15 10:11

Julian


2 Answers

I couldn't use Pandas DataFrames as inputs & outputs to Keras model.fit, at least not Pandas 0.13.1, which is the standard package from Ubuntu.

Instead, use np.array(X_train) and np.array(Y_train). That worked for me.

like image 189
Paul Avatar answered Oct 21 '22 06:10

Paul


I've experienced a similar issue. In my case the problem was in that you use Embeddings layer with predefined dimensions on input, so the sequences you pass to this layer should be padded or truncated to the input_size using keras.preprocessing.sequence.

like image 27
noisefield Avatar answered Oct 21 '22 06:10

noisefield