So I have a text classification model built with Keras. I've been trying to pad my varying length sequences but the Keras function pad_sequences()
has just returned zeros.
I've figured out that if you have a numpy array like the one below, it works just fine. But once the elements become floats or decimals like the second array it just turns to zeros.
x = [[1, 2], [3,4,5], [4], [7,8,9,10]]
print pad_sequences(x, padding='post')
outputs:
[[ 1 2 0 0]
[ 3 4 5 0]
[ 4 0 0 0]
[ 7 8 9 10]]
But
x = [[.1, .2], [.3,.4,.5], [.4], [.7,.8,.9,.010]]
print pad_sequences(x, padding='post')
outputs:
[[ 0 0 0 0]
[ 0 0 0 0]
[ 0 0 0 0]
[ 0 0 0 0]]
And this:
x = [[.1, .2], [.3,.4,.5], [.4], [.7,.8,.9,.010]]
print pad_sequences(x, padding='post', value=99)
outputs:
[[ 0 0 99 99]
[ 0 0 0 99]
[ 0 99 99 99]
[ 0 0 0 0]]
So I guess this function just ignores floats/decimals. Is there a way I can get around this?
It is caused by the fact that the default data type considered in the pad_sequences
function is int32
. Therefore, all the values will be casted to integer (and in this case become zero). To resolve this, pass dtype='float32'
argument:
pad_sequences(x, padding='post', value=99, dtype='float32')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With