Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String Matching Using Recurrent Neural Networks

I have recently started exploring Recurrent Neural Networks. So far I have trained character level language model on tensorFlow using Andrej Karpathy's blog. It works great.

I couldnt however find any study on using RNNs for string matching or keyword spotting. For one of my project I require OCR of scanned documents and then parsing the converted text for key data points. Most string matching techniques fail to incorporate the OCR conversion mistakes and that leads to significant error.

Is it possible to train the RNN on the variations of converted text I receive and use it for finding keywords.

like image 215
Fahad Sarfraz Avatar asked Dec 01 '15 08:12

Fahad Sarfraz


1 Answers

This paper may the thing you are looking for:

[1608.02214] Robsut Wrod Reocginiton via semi-Character Recurrent Neural Network

A Brief introduction:

The author of this paper demonstrated a method to recognize jumbled words which like Cmabrigde Uinervtisy(Cambridge University). Training the neural network with correct begin, end characters and the encoded internal characters which doesn't contain it's position information, the neural network can learn to recognize and correct it.

You can easily modify the network structure to adapt your own need, the OCR, as you had mentioned.


(source: firefoxusercontent.com)


(source: firefoxusercontent.com)

like image 173
allenyllee Avatar answered Nov 10 '22 03:11

allenyllee