Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pytorch model evaluation slow when deployed on kubernetes

I would like to make the result of a text classification model (finBERT pytorch model) available through an endpoint that is deployed on Kubernetes.

The whole pipeline is working but it's super slow to process (30 seconds for one sentence) when deployed. If I time the same endpoint in local, I'm getting results in 1 or 2 seconds. Running the docker image in local, the endpoint also takes 2 seconds to return a result.

When I'm checking the CPU usage of my kubernetes instance while the request is running, it doesn't go above 35% so I'm not sure it's related to a lack of computation power?

Did anyone witness such performances issues when making a forward pass to a pytorch model? Any clues on what I should investigate?

Any help is greatly appreciated, thank you!

I am currently using

limits: cpu: "2" requests: cpu: "1"

Python : 3.7 Pytorch : 1.8.1

like image 560
move_ludwig Avatar asked Oct 25 '25 09:10

move_ludwig


1 Answers

I had the same issue. Locally my pytorch model would return a prediction in 25 ms and then on Kubernetes it would take 5 seconds. The problem had to do with how many threads torch had available to use. I'm not 100% sure why this works, but reducing the number of threads sped up performance significantly.

Set the following environment variable on your kubernetes pod. OMP_NUM_THREADS=1

After doing that it performed on kubernetes like it did running it locally ~30ms per call.

These are my pod limits:

  • cpu limits 1
  • mem limits: 1500m

I was led to discover this from this blog post: https://www.chunyangwen.com/blog/python/pytorch-slow-inference.html

like image 91
LogDog23 Avatar answered Oct 27 '25 01:10

LogDog23



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!