I have a sequence to sequence learning model which works fine and able to predict some outputs. The problem is I have no idea how to convert the output back to text sequence.
This is my code.
from keras.preprocessing.text import Tokenizer,base_filter from keras.preprocessing.sequence import pad_sequences from keras.models import Sequential from keras.layers import Dense txt1="""What makes this problem difficult is that the sequences can vary in length, be comprised of a very large vocabulary of input symbols and may require the model to learn the long term context or dependencies between symbols in the input sequence.""" #txt1 is used for fitting tk = Tokenizer(nb_words=2000, filters=base_filter(), lower=True, split=" ") tk.fit_on_texts(txt1) #convert text to sequence t= tk.texts_to_sequences(txt1) #padding to feed the sequence to keras model t=pad_sequences(t, maxlen=10) model = Sequential() model.add(Dense(10,input_dim=10)) model.add(Dense(10,activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam',metrics=['accuracy']) #predicting new sequcenc pred=model.predict(t) #Convert predicted sequence to text pred=??
Keras provides the one_hot() function that you can use to tokenize and integer encode a text document in one step. The name suggests that it will create a one-hot encoding of the document, which is not the case. Instead, the function is a wrapper for the hashing_trick() function described in the next section.
The word_index assigns a unique index to each word present in the text. This unique integer helps the model during training purposes. In [4]: print("The word index",t. word_index)
By default, all punctuation is removed, turning the texts into space-separated sequences of words (words maybe include the ' character). These sequences are then split into lists of tokens.
You can use directly the inverse tokenizer.sequences_to_texts
function.
text = tokenizer.sequences_to_texts(<list-of-integer-equivalent-encodings>)
I have tested the above and it works as expected.
PS.: Take extra care to make the argument be the list of the integer encodings and not the One Hot ones.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With