Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to apply a pretrained transformer model from huggingface?

I am interested in using pre-trained models from Hugging Face for named entity recognition (NER) tasks without further training or testing of the model.

On the model page of Hugging Face, the only information for reusing the model are as follows:

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")

I tried the following code, but I am getting a tensor output instead of class labels for each named entity.

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")

text = "my text for named entity recognition here."

input_ids = torch.tensor(tokenizer.encode(text, padding=True, truncation=True,max_length=50, add_special_tokens = True)).unsqueeze(0)

with torch.no_grad():
  output = model(input_ids, output_attentions=True)

Any suggestions on how to apply the model on a text for NER?

like image 540
Hanif Avatar asked Sep 07 '25 22:09

Hanif


1 Answers

In transformers NER is done with the TokenClassificationPipeLine:

from transformers import AutoTokenizer, pipeline,  AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
model = AutoModelForTokenClassification.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
nerpipeline = pipeline('ner', model=model, tokenizer=tokenizer)
text = "my text for named entity recognition here."
nerpipeline(text)

Output:

[{'word': 'my',
  'score': 0.5209763050079346,
  'entity': 'LABEL_0',
  'index': 1,
  'start': 0,
  'end': 2},
 {'word': 'text',
  'score': 0.5161970257759094,
  'entity': 'LABEL_0',
  'index': 2,
  'start': 3,
  'end': 7},
 {'word': 'for',
  'score': 0.5297629237174988,
  'entity': 'LABEL_1',
  'index': 3,
  'start': 8,
  'end': 11},
 {'word': 'named',
  'score': 0.5258920788764954,
  'entity': 'LABEL_1',
  'index': 4,
  'start': 12,
  'end': 17},
 {'word': 'entity',
  'score': 0.5415489673614502,
  'entity': 'LABEL_1',
  'index': 5,
  'start': 18,
  'end': 24},
 {'word': 'recognition',
  'score': 0.5396601557731628,
  'entity': 'LABEL_1',
  'index': 6,
  'start': 25,
  'end': 36},
 {'word': 'here',
  'score': 0.5165827870368958,
  'entity': 'LABEL_0',
  'index': 7,
  'start': 37,
  'end': 41},
 {'word': '.',
  'score': 0.5266348123550415,
  'entity': 'LABEL_0',
  'index': 8,
  'start': 41,
  'end': 42}]

Please note that you need to use AutoModelForTokenClassification instead of AutoModel and that not all models have a trained head for token classification, i.e. you will get random weights for the token classification head :)

like image 156
cronoik Avatar answered Sep 11 '25 01:09

cronoik