Espresso ANERuntimeEngine Program Inference overflow

Question

I have two CoreML models. One works fine, and the other generates this error message:

[espresso] [Espresso::ANERuntimeEngine::__forward_segment 0] evaluate[RealTime]WithModel returned 0; code=5 err=Error Domain=com.apple.appleneuralengine Code=5 "processRequest:qos:qIndex:error:: 0x3: Program Inference overflow" UserInfo={NSLocalizedDescription=processRequest:qos:qIndex:error:: 0x3: Program Inference overflow}
[espresso] [Espresso::overflow_error] /var/containers/Bundle/Application/E0DE5E08-D2C6-48AF-91B2-B42BA7877E7E/xxx demoapp.app/mpii-hg128.mlmodelc/model.espresso.net:0

Both models are very similar, (Conv2D models). There are generated with the same scripts and versions of PyTorch, ONNX, and onnx-coreml. The model that works has 1036 layers, and the model that generates the error has 599 layers. They both use standard layers - Conv2D, BatchNorm, ReLU, MaxPool, and Upsample (no custom layers and no Functional or Numpy stuff). They both use relatively the same number of features per layer. They follow essentially the same structure, except the erroring model skips a maxpool layer at the start (hence the higher output resolution).

They both take a 256x256 color image as input, and output 16 channels at (working) 64x64 and (erroring) 128x128 pixels.

The app does not crash, but gives garbage results for the erroring model.

Both models train, evaluate, etc. fine in their native formats (PyTorch).

I have no idea what a Code=5 "processRequest:qos:qIndex:error:: 0x3: Program Inference overflow" error is, and google searches are not yielding anything productive, as I gather "Espresso" and "ANERuntimeEngine" are both private Apple Libraries.

What is this error message telling me? How can I fix it?

Can I avoid this error message by not running the model on the bionic chip but on the CPU/GPU?

Any help is appreciated, thanks.

Matthijs Hollemans · Accepted Answer

That's a LOT of layers!

Espresso is the C++ library that runs the Core ML models. ANERuntimeEngine is used with the Apple Neural Engine chip.

By passing in an MLModelConfiguration with computeUnits set to .cpuAndGPU when you load the Core ML model, you can tell Core ML to not use the Neural Engine.

Espresso ANERuntimeEngine Program Inference overflow

Tags:

ios

swift

pytorch

coreml

onnx

Stephen Furlani

Video Answer

1 Answers

Matthijs Hollemans

Recent Activity

Donate For Us

Espresso ANERuntimeEngine Program Inference overflow

Tags:

ios

swift

pytorch

coreml

onnx

Stephen Furlani

Video Answer

1 Answers

Matthijs Hollemans

Related questions

Recent Activity

Donate For Us