Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove a word in a span from SpaCy?

I am parsing a sentence with Spacy like following:

import spacy
nlp = spacy.load("en")
span = nlp("This is some text.")

I am wondering if there is a way to delete a word in the span, while still keep the remaining words format like a sentence. Such as

del span[3]

which could yield a sentence like

This is some.

If some other methods without SpaCy could achieve the same effect that will be great too.

like image 463
ZEWEI CHU Avatar asked Sep 05 '18 21:09

ZEWEI CHU


People also ask

What is a span in spaCy?

From spaCy's documentation, a Token represents a single word, punctuation symbol, whitespace, etc. from a document, while a Span is a slice from the document. In other words, a Span is an ordered sequence of Token s.

What is Amod in spaCy?

spaCy accurately labels “awesome” as an adjectival modifier (amod) and also detects its relationship to “buffet”: for token in doc: if token.dep_ == 'amod': print(f"ADJ MODIFIER: {token.text} --> NOUN: {token.head}")

What is doc ents in spaCy?

ents Property. Advertisements. This doc property is used for the named entities in the document. If the entity recognizer has been applied, this property will return a tuple of named entity span objects.


1 Answers

There is a workaround for that.

The idea is that you create a numpy array from the doc, you delete the entry you don't want and then you create a doc from the new numpy array.

import spacy
from spacy.attrs import LOWER, POS, ENT_TYPE, IS_ALPHA
from spacy.tokens import Doc
import numpy

def remove_span(doc, index):
    np_array = doc.to_array([LOWER, POS, ENT_TYPE, IS_ALPHA])
    np_array_2 = numpy.delete(np_array, (index), axis = 0)
    doc2 = Doc(doc.vocab, words=[t.text for i, t in enumerate(doc) if i!=index])
    doc2.from_array([LOWER, POS, ENT_TYPE, IS_ALPHA], np_array_2)
    return doc2

# load english model
nlp = spacy.load('en')
doc = nlp("This is some text")
new_doc = remove_span(doc, 3)
print(new_doc)

Hope it helps!

like image 142
gdaras Avatar answered Oct 20 '22 08:10

gdaras