Is it possible to use spacy with already tokenized input?

Tags:

I have a sentence that has already been tokenized into words. I want to get the part of speech tag for each word in the sentence. When I check the documentation in SpaCy I realized it starts with the raw sentence. I don't want to do that because in that case, the spacy might end up with a different tokenization. Therefore, I wonder if using spaCy with the list of words (rather than a string) is possible or not ?

Here is an example about my question:

# I know that it does the following sucessfully :
import spacy
nlp = spacy.load('en_core_web_sm')
raw_text = 'Hello, world.'
doc = nlp(raw_text)
for token in doc:
    print(token.pos_)

But I want to do something similar to the following:

import spacy
nlp = spacy.load('en_core_web_sm')
tokenized_text = ['Hello',',','world','.']
doc = nlp(tokenized_text)
for token in doc:
    print(token.pos_)

I know, it doesn't work, but is it possible to do something similar to that ?

868

asked Dec 03 '18 13:12

zwlayer

1 Answers

Use the Doc object

import spacy
from spacy.tokens import Doc
nlp = spacy.load("en_core_web_sm")

sents = [['Hello', ',','world', '.']]
for sent in sents:
    doc = Doc(nlp.vocab, sent)
    for token in nlp(doc):
        print(token.text, token.pos_)

146

answered Oct 23 '22 13:10

Victor Yan

Related questions
                            
                                How to log from separate module in Flask
                            
                                Difference between get and dunder getitem [duplicate]
                            
                                from torch._C import * ImportError: DLL load failed: The specified module could not be found
                            
                                How to create new column in Pandas with condition to repeat by a value of another column?
                            
                                How to override the html default "Please fill out this field" when validation fails in Flask?
                            
                                pip install fail with SSL certificate verify failed (_ssl.c:833)
                            
                                Python script stops running when screen turns off
                            
                                Can't install Tensorflow Mac
                            
                                Sklearn Chi2 For Feature Selection
                            
                                Class weights for balancing data in TensorFlow Object Detection API
                            
                                Writing a Large JSON Array To File
                            
                                Using Geopandas, how do I select all points not within a polygon?
                            
                                How to use PyInstaller from script, not terminal?
                            
                                How to capitalize first letter in strings that may contain numbers
                            
                                How to get slope from timeseries data in pandas?
                            
                                Legend with vertical line in matplotlib
                            
                                Installed pytest but running `pytest` in bash returns `not found`
                            
                                How can I select specific fields in django rest framework? [duplicate]
                            
                                MultiThreading in AWS lambda using Python3
                            
                                Compiling cython with gcc: No such file or directory from #include "ios"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is it possible to use spacy with already tokenized input?

Tags:

python

nlp

spacy

zwlayer

People also ask

1 Answers

Victor Yan

Recent Activity

Donate For Us