Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to truncate input in the Huggingface pipeline?

I currently use a huggingface pipeline for sentiment-analysis like so:

from transformers import pipeline
classifier = pipeline('sentiment-analysis', device=0)

The problem is that when I pass texts larger than 512 tokens, it just crashes saying that the input is too long. Is there any way of passing the max_length and truncate parameters from the tokenizer directly to the pipeline?

My work around is to do:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer, device=0)

And then when I call the tokenizer:

pt_batch = tokenizer(text, padding=True, truncation=True, max_length=512, return_tensors="pt")

But it would be much nicer to simply be able to call the pipeline directly like so:

classifier(text, padding=True, truncation=True, max_length=512)
like image 400
EtienneT Avatar asked Sep 08 '25 08:09

EtienneT


1 Answers

you can use tokenizer_kwargs while inference :

model_pipline = pipeline("text-classification",model=model,tokenizer=tokenizer,device=0, return_all_scores=True)

tokenizer_kwargs = {'padding':True,'truncation':True,'max_length':512,'return_tensors':'pt'}

prediction = model_pipeline('sample text to predict',**tokenizer_kwargs)

for more details you can check this link

like image 180
Bhupendra Singh Rathore Avatar answered Sep 11 '25 02:09

Bhupendra Singh Rathore