Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Espresso ANERuntimeEngine Program Inference overflow

I have two CoreML models. One works fine, and the other generates this error message:

[espresso] [Espresso::ANERuntimeEngine::__forward_segment 0] evaluate[RealTime]WithModel returned 0; code=5 err=Error Domain=com.apple.appleneuralengine Code=5 "processRequest:qos:qIndex:error:: 0x3: Program Inference overflow" UserInfo={NSLocalizedDescription=processRequest:qos:qIndex:error:: 0x3: Program Inference overflow}
[espresso] [Espresso::overflow_error] /var/containers/Bundle/Application/E0DE5E08-D2C6-48AF-91B2-B42BA7877E7E/xxx demoapp.app/mpii-hg128.mlmodelc/model.espresso.net:0

Both models are very similar, (Conv2D models). There are generated with the same scripts and versions of PyTorch, ONNX, and onnx-coreml. The model that works has 1036 layers, and the model that generates the error has 599 layers. They both use standard layers - Conv2D, BatchNorm, ReLU, MaxPool, and Upsample (no custom layers and no Functional or Numpy stuff). They both use relatively the same number of features per layer. They follow essentially the same structure, except the erroring model skips a maxpool layer at the start (hence the higher output resolution).

They both take a 256x256 color image as input, and output 16 channels at (working) 64x64 and (erroring) 128x128 pixels.

The app does not crash, but gives garbage results for the erroring model.

Both models train, evaluate, etc. fine in their native formats (PyTorch).

I have no idea what a Code=5 "processRequest:qos:qIndex:error:: 0x3: Program Inference overflow" error is, and google searches are not yielding anything productive, as I gather "Espresso" and "ANERuntimeEngine" are both private Apple Libraries.

What is this error message telling me? How can I fix it?

Can I avoid this error message by not running the model on the bionic chip but on the CPU/GPU?

Any help is appreciated, thanks.

like image 482
Stephen Furlani Avatar asked Feb 19 '19 18:02

Stephen Furlani


Video Answer


1 Answers

That's a LOT of layers!

Espresso is the C++ library that runs the Core ML models. ANERuntimeEngine is used with the Apple Neural Engine chip.

By passing in an MLModelConfiguration with computeUnits set to .cpuAndGPU when you load the Core ML model, you can tell Core ML to not use the Neural Engine.

like image 192
Matthijs Hollemans Avatar answered Sep 30 '22 20:09

Matthijs Hollemans