Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OCR lib for math formulas

Tags:

ocr

I need an open OCR library which is able to scan complex printed math formulas (for example some formulas which were generated via LaTeX). I want to get some LaTeX-like output (or just some AST-like data).

Is there something like this already? Or are current OCR technics just able to parse line-oriented text?

(Note that I also posted this question on Metaoptimize because some people there might have additional knowledge.)

The problem was also described by OpenAI as im2latex.

like image 789
Albert Avatar asked Aug 25 '10 21:08

Albert


2 Answers

SESHAT is a open source system written in C++ for recognizing handwritten mathematical expressions. SESHAT was developed as part of a PhD thesis at the PRHLT research center at Universitat Politècnica de València.

An online demo:http://cat.prhlt.upv.es/mer/

The source: https://github.com/falvaro/seshat

Seshat is an open-source system for recognizing handwritten mathematical expressions. Given a sample represented as a sequence of strokes, the parser is able to convert it to LaTeX or other formats like InkML or MathML.

like image 87
Slothworks Avatar answered Oct 12 '22 02:10

Slothworks


According to the answers on Metaoptimize and the discussion on the Tesseract mailinglist, there doesn't seem to be an open/free solution yet which can do that.

The only solution which seems to be able to do it (but I cannot verify as it is Windows-only and non-free) is, like a few other people have mentioned, the InftyProject.

like image 27
Albert Avatar answered Oct 12 '22 01:10

Albert