I'm setting language_model_penalty_non_dict_word
through a config file for Tesseract 3.01, but its value doesn't have any effect. I've tried with multiple images, and multiple values for it, but the output for each image is always the same. Another user has noticed the same in a comment in another question.
Edit: After looking inside the source, the variable language_model_penalty_non_dict_word
is used only inside the function float LanguageModel::ComputeAdjustedPathCost
.
However, this function is never called! It is referenced only by 2 functions - LanguageModel::UpdateBestChoice()
and LanguageModel::AddViterbiStateEntry()
. I placed breakpoints in those functions, but they weren't being called, as well.
After some debugging, I finally found out the reason - the function Wordrec::SegSearch()
wasn't being called (and it is up there in the call graph of LanguageModel::ComputeAdjustedPathCost()
).
From this code:
if (enable_new_segsearch) {
SegSearch(&chunks_record, word->best_choice,
best_char_choices, word->raw_choice, state);
} else {
best_first_search(&chunks_record, best_char_choices, word,
state, fixpt, best_state);
}
So you need to set enable_new_segsearch
in the config file:
enable_new_segsearch 1
language_model_penalty_non_freq_dict_word 0.2
language_model_penalty_non_dict_word 0.3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With