how can I untokenize the output of this code?
class Core:
def __init__(self, user_input):
pos = pop(user_input)
subject = ""
for token in pos:
if token.dep == nsubj:
subject = untokenize.untokenize(token)
subject = S(subject)
I tried: https://pypi.org/project/untokenize/
MosesDetokenizer
.join()
But I have this error for my last code (from this post):
TypeError: 'spacy.tokens.token.Token' object is not iterable
This error for .join():
AttributeError: 'spacy.tokens.token.Token' object has no attribute 'join'
And for MosesDetokenizer: text = u" {} ".format(" ".join(tokens)) TypeError: can only join an iterable
All tokens in spacy keep their context around so all text can be recreated without any loss of data.
In your case, all you have to do is:
''.join([token.text_with_ws for token in doc])
Since the attribute text_with_ws
has the token with its corresponding whitespace character if it exists.
SpaCy tokens have their doc object associated with them, so this will give you the original sentence as a string:
import spacy
nlp = spacy.load('en')
doc = nlp("I like cake.")
token = doc[0]
print(token.doc) # prints "I like cake."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With