Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get position of word in sentence with spacy

Tags:

I'm aware of the basic spacy workflow for getting various attributes from a document, however I can't find a built in function to return the position (start/end) of a word which is part of a sentence.

Would anyone know if this is possible with Spacy?

like image 317
jack west Avatar asked Sep 05 '17 07:09

jack west


People also ask

What does NLP () do in spaCy?

When you call nlp on a text, spaCy first tokenizes the text to produce a Doc object. The Doc is then processed in several different steps – this is also referred to as the processing pipeline. The pipeline used by the trained pipelines typically include a tagger, a lemmatizer, a parser and an entity recognizer.

How do you use spaCy tokenization in a sentence?

In Spacy, the process of tokenizing a text into segments of words and punctuation is done in various steps. It processes the text from left to right. First, the tokenizer split the text on whitespace similar to the split() function. Then the tokenizer checks whether the substring matches the tokenizer exception rules.

What is attribute ruler spaCy?

The attribute ruler lets you set token attributes for tokens identified by Matcher patterns. The attribute ruler is typically used to handle exceptions for token attributes and to map values between attributes such as mapping fine-grained POS tags to coarse-grained POS tags. See the usage guide for examples.

What is token DEP_ in spaCy?

Editable CodespaCy v3. Text: The original token text. Dep: The syntactic relation connecting child to head. Head text: The original text of the token head. Head POS: The part-of-speech tag of the token head. Children: The immediate syntactic dependents of the token.


1 Answers

These are available as attributes of the tokens in the sentences. Doc says:

idx int The character offset of the token within the parent document.

i int The index of the token within the parent document.

>>> import spacy
>>> nlp = spacy.load('en')
>>> parsed_sentence = nlp(u'This is my sentence')
>>> [(token.text,token.i) for token in parsed_sentence]
[(u'This', 0), (u'is', 1), (u'my', 2), (u'sentence', 3)]
>>> [(token.text,token.idx) for token in parsed_sentence]
[(u'This', 0), (u'is', 5), (u'my', 8), (u'sentence', 11)]
like image 175
DhruvPathak Avatar answered Sep 16 '22 12:09

DhruvPathak