Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loading saved NER back into HuggingFace pipeline?

I am doing some research into HuggingFace's functionalities for transfer learning (specifically, for named entity recognition). To preface, I am a bit new to transformer architectures. I briefly walked through their example off of their website:

from transformers import pipeline

nlp = pipeline("ner")

sequence = "Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO, therefore very" \
       "close to the Manhattan Bridge which is visible from the window."

print(nlp(sequence))

What I would like to do is save and run this locally without having to download the "ner" model every time (which is over 1 GB in size). In their documentation, I see that you can save the pipeline using the "pipeline.save_pretrained()" function to a local folder. The results of this are various files which I am storing into a specific folder.

My question would be how can I load this model back up into a script to continue classifying as in the example above after saving? The output of "pipeline.save_pretrained()" is multiple files.

Here is what I have tried so far:

1: Following the documentation about pipeline

pipe = transformers.TokenClassificationPipeline(model="pytorch_model.bin", tokenizer='tokenizer_config.json')

The error I got was: 'str' object has no attribute "config"

2: Following HuggingFace example on ner:

from transformers import AutoModelForTokenClassification, AutoTokenizer
import torch

model = AutoModelForTokenClassification.from_pretrained("path to folder following .save_pretrained()")
tokenizer = AutoTokenizer.from_pretrained("path to folder following .save_pretrained()")

label_list = [
"O",       # Outside of a named entity
"B-MISC",  # Beginning of a miscellaneous entity right after another miscellaneous entity
"I-MISC",  # Miscellaneous entity
"B-PER",   # Beginning of a person's name right after another person's name
"I-PER",   # Person's name
"B-ORG",   # Beginning of an organisation right after another organisation
"I-ORG",   # Organisation
"B-LOC",   # Beginning of a location right after another location
"I-LOC"    # Location
]

sequence = "Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO, therefore very" \
       "close to the Manhattan Bridge."

# Bit of a hack to get the tokens with the special tokens
tokens = tokenizer.tokenize(tokenizer.decode(tokenizer.encode(sequence)))
inputs = tokenizer.encode(sequence, return_tensors="pt")

outputs = model(inputs)[0]
predictions = torch.argmax(outputs, dim=2)

print([(token, label_list[prediction]) for token, prediction in zip(tokens, predictions[0].tolist())])

This yields an error: list index out of range

I also tried printing out just predictions which is not returning the text format of the tokens along with their entities.

Any help would be much appreciated!

like image 720
rmahesh Avatar asked Oct 28 '25 08:10

rmahesh


1 Answers

Loading a model like this has always worked for me:

from transformers import pipeline

pipe = pipeline('token-classification', model=model_folder, tokenizer=model_folder)

Have a look at here for further examples on how to use pipelines.

like image 172
ClaudiaR Avatar answered Oct 30 '25 19:10

ClaudiaR



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!