Save SpaCy PhraseMatcher to disk

Question

I am creating a phrasematcher with SpaCy like this:

import spacy
from spacy.matcher import PhraseMatcher

nlp = spacy.load("en")
label = "SKILL"
print("Creating the matcher...")

start = time.time()
matcher = PhraseMatcher(nlp.vocab)
for i in list_skills:
    matcher.add(label, None, nlp(i))

My list_skills is very big, so the creation of matcher takes a long time, and I reuse it very often. Is there a way to save the matcher to disk, and reload it later on without having to recreate it everytime ?

aab · Accepted Answer

You can save time some time initially by using nlp.tokenizer.pipe() to process your texts:

for doc in nlp.tokenizer.pipe(list_skills):
    matcher.add(label, None, doc)

This just tokenizes, which is much faster than running the full en pipeline. If you're using certain attr settings with PhraseMatcher, you may need nlp.pipe() instead, but you should get an error if this is the case.

You can pickle a PhraseMatcher to save it to disk. Unpickling is not extremely fast because it has to reconstruct some internal data structures, but it should be a quite a bit faster than creating the PhraseMatcher from scratch.

Save SpaCy PhraseMatcher to disk

Tags:

spacy

Spider

1 Answers

aab

Recent Activity

Donate For Us

Save SpaCy PhraseMatcher to disk

Tags:

spacy

Spider

1 Answers

aab

Related questions

Recent Activity

Donate For Us