Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multi-threading training for spacy in python

I trying to find a way to use multi-thread on spacy for training a NER model. It's look like multithread is used by default on my working computer (Ubuntu 16.04 Python3.5) but not on my server.

Any idea why?

Info about spaCy & env on server

Platform           Linux-3.14.32-xxxx-grs-ipv6-64-x86_64-with-Debian-8
Python version     3.4.2          
Location           /home/nlp/.env/lib/python3.4/site-packages/spacy
Models             fr, fr_core_news_md
spaCy version      2.0.5

Process for try:

Installation

python3 -m venv .env
source .env/bin/activate
pip install -U spacy
pip3 install pip --upgrade
python -m spacy download fr
python -m spacy validate

Script python3

import spacy
import random

ITERATION_NBR = 100
DROP_RATE = 0.5

TRAIN_DATA = [
    ('Who is Shaka Khan?', {
        'entities': [(7, 17, 'PERSON')]
    }),
    ('I like London and Berlin.', {
        'entities': [(7, 13, 'LOC'), (18, 24, 'LOC')]
    })
]

def main():
    try:
        nlp = spacy.load("fr")
    except:
        nlp = spacy.load("fr_core_news_sm")
    if 'ner' not in nlp.pipe_names:
        ner = nlp.create_pipe('ner')
        nlp.add_pipe(ner, last=True)
    else:
        ner = nlp.get_pipe('ner')
    for _, annotations in TRAIN_DATA:
        for ent in annotations.get('entities'):
            ner.add_label(ent[2])
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
    with nlp.disable_pipes(*other_pipes):
        optimizer = nlp.begin_training()
        for itn in range(ITERATION_NBR):
            random.shuffle(TRAIN_DATA)
            losses = {}
            for text, annotations in TRAIN_DATA:
                nlp.update(
                    [text],
                    [annotations],
                    drop=DROP_RATE,
                    sgd=optimizer,
                    losses=losses)

Execution

python3 <scriptName>.py
like image 236
Thoc Avatar asked May 28 '26 07:05

Thoc


1 Answers

It need to be python >= 3.5 for the multi-threading to work by default while training spacy

like image 101
Thoc Avatar answered May 30 '26 20:05

Thoc



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!