What I have: A trained recurrent neural network in Tensorflow.
What I want: A mobile application that can run this network as fast as possible (inference mode only, no training).
I believe there are multiple ways how I can accomplish my goal, but I would like you feedback/corrections and additions because I have never done this before.
Some details about the mobile application. The application will take a sound recording of the user, do some processing (like Speech2Text) and output the text. I do not want to find a solution that is "fast enough", but the fastest option because this will happen over very large sound files. So almost every speed improvement counts. Do you have any advice, how I should approach this problem?
Last question: If I try to hire somebody to help me out, should I look for an Android/iOS-, Embedded- or Tensorflow- type of person?
The context units in a Jordan network are also referred to as the state layer. They have a recurrent connection to themselves. Elman and Jordan networks are also known as "Simple recurrent networks" (SRN).
The backpropagation algorithm of an artificial neural network is modified to include the unfolding in time to train the weights of the network. This algorithm is based on computing the gradient vector and is called back propagation in time or BPTT algorithm for short. The pseudo-code for training is given below.
Because of their internal memory, RNN's can remember important things about the input they received, which allows them to be very precise in predicting what's coming next. This is why they're the preferred algorithm for sequential data like time series, speech, text, financial data, audio, video, weather and much more.
The Neural Magic Inference Engine works by optimizing how a neural network is executed across the available memory hierarchies in a CPU. The associated engine algorithms identify memory-bound processes within the network – like depthwise convolutions, as an example – and apply optimization techniques to accelerate performance of those components.
That being said, MobileNetV2 can be pruned further if accuracy recovery trade-off is acceptable. The Neural Magic Inference Engine works by optimizing how a neural network is executed across the available memory hierarchies in a CPU.
The graph shows the maximum IPS (images-per-second) Neural Magic Inference Engine was able to achieve with MobileNetV2 for batch size 1, fp 32, on a 4-core CPU. On a 4-core CPU, Neural Magic Inference Engine is capable of achieving 12.7x better performance than a standalone 4-core CPU, 4.5x better than DNNL, and 1.2x better than OpenVINO.
1. TensorflowLite
Pro: it uses GPU optimizations on Android; fairly easy to incorporate into Swift/Objective-C app, and very easy into Java/Android (just adding one line in gradle.build); You can transform TF model to CoreML
Cons: if you use C++ library - you will have some issues adding TFLite as a library to your Android/Java-JNI (there is no native way to build such library without JNI); No GPU support on iOS (community works on MPS integration tho)
Also here is reference to TFLite speech-to-text demo app, it could be useful.
2. TensorRT
It uses TensorRT uses cuDNN which uses CUDA library. There is CUDA for Android, not sure if it supports the whole functionality.
3. Custom code + Libraries
I would recommend you to use Android NNet library and CoreML; in case you need to go deeper - you can use Eigen library for linear algebra. However, writing your own custom code is not beneficial in the long term, you would need to support/test/improve it - which is a huge deal, more important than performance.
Re-implement Everything
This option is very similar to the previous one, implementing your own RNN(LSTM) should be fine, as soon as you know what you are doing, just use one of the linear algebra libraries (e.g. Eigen).
The overall recommendation would be to:**
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With