Traceback (most recent call last):
File ".\keras_test.py", line 62, in <module>
X_train = sequence.pad_sequences(X_train, maxlen=max_review_length)
File "C:\Program Files\Python36\lib\site-packages\keras\preprocessing\sequence.py", line 69, in pad_sequences
trunc = np.asarray(trunc, dtype=dtype)
File "C:\Program Files\Python36\lib\site-packages\numpy\core\numeric.py", line 531, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: invalid literal for int() with base 10: "plus 've added commercials experience tacky"
Hi there. I'm getting this error when trying to use the pad_sequence function of Keras. X_train is a sequence of strings, where "plus 've added commercials experience tacky" is the first of those strings.
The pad_sequence function has its default data type as 'int32':
keras.preprocessing.sequence.pad_sequences(sequences, maxlen=None, dtype='int32',
padding='pre', truncating='pre', value=0.)
The data you're passing is a string instead.
Adding to that, you can't use strings in a keras model.
You must "tokenize" those strings. Even if you may think it could pad strings, you must then decide what character it will pad with:
That's why you must create a dictionary of integer id values representing each char or word in your existing data. And transform all your strings in lists of ids
Then you'd probably benefit from starting the model with an Embedding
layer.
Example, if you're working with word ids:
Word 0: null word
Word 1: end of sentence
Word 2: space character (maybe not important to some languages)
Word 3: a
Word 4: added
Word 5: am
Word 6: and
....
Word 520: plus
Word 2014: 've
Word
etc.....
Then your sentence would be a list with: [520, 2014, 4, ....]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With