I am parsing a sentence with Spacy like following:
import spacy
nlp = spacy.load("en")
span = nlp("This is some text.")
I am wondering if there is a way to delete a word in the span, while still keep the remaining words format like a sentence. Such as
del span[3]
which could yield a sentence like
This is some.
If some other methods without SpaCy could achieve the same effect that will be great too.
From spaCy's documentation, a Token represents a single word, punctuation symbol, whitespace, etc. from a document, while a Span is a slice from the document. In other words, a Span is an ordered sequence of Token s.
spaCy accurately labels “awesome” as an adjectival modifier (amod) and also detects its relationship to “buffet”: for token in doc: if token.dep_ == 'amod': print(f"ADJ MODIFIER: {token.text} --> NOUN: {token.head}")
ents Property. Advertisements. This doc property is used for the named entities in the document. If the entity recognizer has been applied, this property will return a tuple of named entity span objects.
There is a workaround for that.
The idea is that you create a numpy array from the doc, you delete the entry you don't want and then you create a doc from the new numpy array.
import spacy
from spacy.attrs import LOWER, POS, ENT_TYPE, IS_ALPHA
from spacy.tokens import Doc
import numpy
def remove_span(doc, index):
np_array = doc.to_array([LOWER, POS, ENT_TYPE, IS_ALPHA])
np_array_2 = numpy.delete(np_array, (index), axis = 0)
doc2 = Doc(doc.vocab, words=[t.text for i, t in enumerate(doc) if i!=index])
doc2.from_array([LOWER, POS, ENT_TYPE, IS_ALPHA], np_array_2)
return doc2
# load english model
nlp = spacy.load('en')
doc = nlp("This is some text")
new_doc = remove_span(doc, 3)
print(new_doc)
Hope it helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With