Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

tesseract v3.03 render PDF with searchable text example

From the tesseract v3.03 's release note, tesseract is now supporting render PDF output with searchable text, but I don't know how to use this feature in my code.
Currently I use tess-two for my android app, then I just wonder can this feature work for android?

It would be great if you can give me an example that uses tesseract api to render pdf, and then I will try to port missing functions for tess-two library.
Thanks in advance.

P/s: I can see the pdfrenderer file which may handle render pdf output, but I don't know how to apply it with base api.

Update: here is my try:

 tesseract::TessResultRenderer* renderer = new tesseract::TessPDFRenderer(nat->api.GetDatapath());
__android_log_print(ANDROID_LOG_ERROR, "Test_tesseract", "data path = %s", nat->api.GetDatapath());
if (!nat->api.ProcessPages(c_file_name, NULL, 0, renderer)) {
    __android_log_print(ANDROID_LOG_ERROR, "Test_tesseract", "process page failed");
    delete renderer;
    return;
}

FILE* fout = fopen(c_pdf_file_name, "wb");
if (fout == NULL) {
    __android_log_print(ANDROID_LOG_ERROR, "Test_tesseract", "Cannot create output file %s\n", c_pdf_file_name);
    delete renderer;
    return;
}

const char* data;
int dataLength;

bool boolValue = renderer->GetOutput(&data, &dataLength);
if (boolValue) {
    fwrite(data, 1, dataLength, fout);
    if (fout != stdout)
        fclose(fout);
    else
        clearerr(fout);
}else{
    __android_log_print(ANDROID_LOG_ERROR, "Test_tesseract", "Cannot get output file");
}
    
delete renderer;

My code is failed at ProcessPages method. After write log (I have a problem with debugging in ndk), I found pdfrender BeginDocument always return false in TessBaseAPI::ProcessPages method of baseapi.cpp:

if (renderer && !renderer->BeginDocument(kUnknownTitle)) {
    success = false;
 }

Do I miss something?

P/s: I use tess-two, which prefer baseapi to capi

like image 316
ductran Avatar asked Feb 12 '14 05:02

ductran


Video Answer


1 Answers

It's as follows:

TessResultRenderer renderer = api.TessPDFRendererCreate(dataPath);
api.TessBaseAPIProcessPages1(handle, image, null, 0, renderer);
PointerByReference data = new PointerByReference();
IntByReference dataLength = new IntByReference();
api.TessResultRendererGetOutput(renderer, data, dataLength);
byte[] bytes = data.getValue().getByteArray(0, dataLength);
// then write bytes array to a file with PDF extension.

If you have problem following the codes, check out the renderer example in this post.

like image 58
nguyenq Avatar answered Oct 11 '22 00:10

nguyenq