I have recently started exploring Recurrent Neural Networks. So far I have trained character level language model on tensorFlow using Andrej Karpathy's blog. It works great.
I couldnt however find any study on using RNNs for string matching or keyword spotting. For one of my project I require OCR of scanned documents and then parsing the converted text for key data points. Most string matching techniques fail to incorporate the OCR conversion mistakes and that leads to significant error.
Is it possible to train the RNN on the variations of converted text I receive and use it for finding keywords.
This paper may the thing you are looking for:
[1608.02214] Robsut Wrod Reocginiton via semi-Character Recurrent Neural Network
A Brief introduction:
The author of this paper demonstrated a method to recognize jumbled words which like Cmabrigde Uinervtisy(Cambridge University). Training the neural network with correct begin, end characters and the encoded internal characters which doesn't contain it's position information, the neural network can learn to recognize and correct it.
You can easily modify the network structure to adapt your own need, the OCR, as you had mentioned.
(source: firefoxusercontent.com)
(source: firefoxusercontent.com)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With