Where can I find documentation on ARPA language model format?
I am developing simple speech recognition app with pocket-sphinx STT engine. ARPA is recommended there for performance reasons. I want to understand how much can I do to adjust my language model for my custom needs.
All I found is some very brief ARPA format descriptions:
I am beginner to STT and I have trouble to wrap head around this (n-grams, etc...). I am looking for more detailed docs. Something like documentation on JSGF grammar here:
http://www.w3.org/TR/jsgf/
ARPA language models are essentially “everything is possible” kind of models of the language. Given any sequence of N or less-that-N words, they provide a probability of that sequence being seen in a sufficiently large representative sample of that language. Consider the text wood pittsburgh cindy jean jean wood
Statistical N-gram models in the ARPA format. ARPA language models are essentially “everything is possible” kind of models of the language. Given any sequence of N or less-that-N words, they provide a probability of that sequence being seen in a sufficiently large representative sample of that language.
Evaluation of ARPA format language models Version 2 of the toolkit includes the ability to calculate perplexities of ARPA format language models. Handling of context cues In version 1, the tags <s>, <p>, and <art>were all hard-wired to represent context cues, and the tag <s>was required to be in the vocabulary.
The ARPA format language model does not contain information as to which words are context cues, so if an ARPA format lanaguage model is used, then a context cuesfile may be specified as well. Output: The program can run in one of two modes.
There is actually not much more to say about the format than is said in those docs..
Besides, you'll probably want to prepare a text file with sample sentences and generate the language file based on it. There is an online version which can do it for you: lmtool
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With