How can I untokenize a spacy.tokens.token.Token?

Question

how can I untokenize the output of this code?

class Core:

def __init__(self, user_input):
    pos = pop(user_input)
    subject = ""
    for token in pos:
        if token.dep == nsubj:
            subject = untokenize.untokenize(token)
    subject = S(subject)

I tried: https://pypi.org/project/untokenize/

MosesDetokenizer

.join()

But I have this error for my last code (from this post):

TypeError: 'spacy.tokens.token.Token' object is not iterable

This error for .join():

AttributeError: 'spacy.tokens.token.Token' object has no attribute 'join'

And for MosesDetokenizer: text = u" {} ".format(" ".join(tokens)) TypeError: can only join an iterable

Nathan McCoy · Accepted Answer

All tokens in spacy keep their context around so all text can be recreated without any loss of data.

In your case, all you have to do is:

''.join([token.text_with_ws for token in doc])

Since the attribute text_with_ws has the token with its corresponding whitespace character if it exists.

polm23 · Answer

SpaCy tokens have their doc object associated with them, so this will give you the original sentence as a string:

import spacy
nlp = spacy.load('en')
doc = nlp("I like cake.")
token = doc[0]

print(token.doc) # prints "I like cake."

How can I untokenize a spacy.tokens.token.Token?

Tags:

python

token

nlp

nltk

spacy

RazvanP

2 Answers

Nathan McCoy

polm23

Recent Activity

Donate For Us

How can I untokenize a spacy.tokens.token.Token?

Tags:

python

token

nlp

nltk

spacy

RazvanP

2 Answers

Nathan McCoy

polm23

Related questions

Recent Activity

Donate For Us