From the tesseract v3.03 's release note, tesseract
is now supporting render PDF output with searchable text, but I don't know how to use this feature in my code.
Currently I use tess-two for my android app, then I just wonder can this feature work for android?
It would be great if you can give me an example that uses tesseract
api to render pdf, and then I will try to port missing functions for tess-two
library.
Thanks in advance.
P/s: I can see the pdfrenderer file which may handle render pdf output, but I don't know how to apply it with base api.
Update: here is my try:
tesseract::TessResultRenderer* renderer = new tesseract::TessPDFRenderer(nat->api.GetDatapath());
__android_log_print(ANDROID_LOG_ERROR, "Test_tesseract", "data path = %s", nat->api.GetDatapath());
if (!nat->api.ProcessPages(c_file_name, NULL, 0, renderer)) {
__android_log_print(ANDROID_LOG_ERROR, "Test_tesseract", "process page failed");
delete renderer;
return;
}
FILE* fout = fopen(c_pdf_file_name, "wb");
if (fout == NULL) {
__android_log_print(ANDROID_LOG_ERROR, "Test_tesseract", "Cannot create output file %s\n", c_pdf_file_name);
delete renderer;
return;
}
const char* data;
int dataLength;
bool boolValue = renderer->GetOutput(&data, &dataLength);
if (boolValue) {
fwrite(data, 1, dataLength, fout);
if (fout != stdout)
fclose(fout);
else
clearerr(fout);
}else{
__android_log_print(ANDROID_LOG_ERROR, "Test_tesseract", "Cannot get output file");
}
delete renderer;
My code is failed at ProcessPages
method. After write log (I have a problem with debugging in ndk), I found pdfrender BeginDocument
always return false in TessBaseAPI::ProcessPages
method of baseapi.cpp
:
if (renderer && !renderer->BeginDocument(kUnknownTitle)) {
success = false;
}
Do I miss something?
P/s: I use tess-two
, which prefer baseapi
to capi
It's as follows:
TessResultRenderer renderer = api.TessPDFRendererCreate(dataPath);
api.TessBaseAPIProcessPages1(handle, image, null, 0, renderer);
PointerByReference data = new PointerByReference();
IntByReference dataLength = new IntByReference();
api.TessResultRendererGetOutput(renderer, data, dataLength);
byte[] bytes = data.getValue().getByteArray(0, dataLength);
// then write bytes array to a file with PDF extension.
If you have problem following the codes, check out the renderer example in this post.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With