How can I add new words or vocabulary into kaldi platform?

Question

I am trying to create a ASR system with existing pre-trained models available as a sample. I got stuck in a place where how to add new words into that trained model, so that next time it will correctly return the word; Some sort of machine learning concept. Any ideas will be helpful.

coldsheep · Accepted Answer

There are two things you might need:

Lexicon: Try to find something like lexicon.txt in your data folder, add your words and corresponding phone sequences in it, like:
```
speech s p iy ch
the dh ax
the dh iy
```
Language Model: Find something like XXX.lm in your data folder, add your word in 1-gram with a probabiliy, like:
```
\data\
ngram 1=200
ngram 2=4000
...

\1-grams
-7.3241 the
...
```

After this, make the decoder HCLG.fst again based on these 2 new files.

Note: Numbers in language will make the results of speech recognition different, you need to choose a proper number, or use toolkit srilm to generate it by the text of your corpus.

How can I add new words or vocabulary into kaldi platform?

Tags:

speech-recognition

models

toolkit

voice-recognition

Vipin YoYo

1 Answers

coldsheep

Recent Activity

Donate For Us

How can I add new words or vocabulary into kaldi platform?

Tags:

speech-recognition

models

toolkit

voice-recognition

Vipin YoYo

1 Answers

coldsheep

Related questions

Recent Activity

Donate For Us