Spacy Pipeline?

Tags:

So lately I've been playing around with a WikiDump. I preprocessed it and trained it on Word2Vec + Gensim

Does anyone know if there is only one script within Spacy that would generate tokenization, sentence recognition, part of speech tagging, lemmatization, dependency parsing, and named entity recognition all at once

I have not been able to find clear documentation Thank you

587

asked Aug 17 '16 00:08

Silas

1 Answers

Spacy gives you all of that with just using en_nlp = spacy.load('en'); doc=en_nlp(sentence). The documentation gives you details about how to access each of the elements.

An example is given below:

Click to copy

In [1]: import spacy
   ...: en_nlp = spacy.load('en')

In [2]: en_doc = en_nlp(u'Hello, world. Here are two sentences.')

Sentences can be obtained by using doc.sents:

Click to copy

In [4]: list(en_doc.sents)
Out[4]: [Hello, world., Here are two sentences.]

Noun chunks are given by doc.noun_chunks:

Click to copy

In [6]: list(en_doc.noun_chunks)
Out[6]: [two sentences]

Named entity is given by doc.ents:

Click to copy

In [11]: [(ent, ent.label_) for ent in en_doc.ents]
Out[11]: [(two, u'CARDINAL')]

Tokenization: You can iterate over the doc to get tokens. token.orth_ gives str of the token.

Click to copy

In [12]: [tok.orth_ for tok in en_doc]
Out[12]: [u'Hello', u',', u'world', u'.', u'Here', u'are', u'two', u'sentences', u'.']

POS is given by token.tag_:

Click to copy

In [13]: [tok.tag_ for tok in en_doc]
Out[13]: [u'UH', u',', u'NN', u'.', u'RB', u'VBP', u'CD', u'NNS', u'.']

Lemmatization:

Click to copy

In [15]: [tok.lemma_ for tok in en_doc]
Out[15]: [u'hello', u',', u'world', u'.', u'here', u'be', u'two', u'sentence', u'.']

Dependency parsing. You can traverse the parse tree by using token.dep_ token.rights or token.lefts. You can write a function to print dependencies:

Click to copy

In [19]: for token in en_doc:
    ...:     print(token.orth_, token.dep_, token.head.orth_, [t.orth_ for t in token.lefts], [t.orth_ for t in token.rights])
    ...:     
(u'Hello', u'ROOT', u'Hello', [], [u',', u'world', u'.'])
(u',', u'punct', u'Hello', [], [])
(u'world', u'npadvmod', u'Hello', [], [])
...

For more details please consult the spacy documentation.

answered Oct 23 '22 00:10

CentAu

Related questions
                            
                                Merging 1D and 2D lists in python
                            
                                Python enums with complex types
                            
                                Write object array to .txt file
                            
                                Making cells bold in a table using python-docx
                            
                                How can I enable xrange in Python 3 for portability?
                            
                                How to obtain a subarray in python 3 [duplicate]
                            
                                django annotate - conditional count
                            
                                Logging and email not working for Django for 500
                            
                                How to know when to use numpy.linalg instead of scipy.linalg?
                            
                                How to speed up nested for loops in Python
                            
                                Hough Line Transform identifies only one line even though image contains many lines in OpenCV in Python
                            
                                Running asynchronous queries in BigQuery not noticeably faster
                            
                                Pandas .describe() only returning 4 statistics on int dataframe (count, unique, top, freq)... no min, max, etc
                            
                                BSON object size of document retrieved from DB
                            
                                Decoding ampersand hash strings (&#124&#120&#97)etc
                            
                                Python3 syntax in PyCharm
                            
                                equivalent to R's `do.call` in python
                            
                                Python 2 and Python 3 - Running in Command Prompt
                            
                                How to fetch specific rows from a tensor in Tensorflow?
                            
                                Re-initialize variables in Tensorflow

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spacy Pipeline?

Tags:

python

nlp

spacy

Silas

People also ask

1 Answers

CentAu

Recent Activity

Donate For Us