Add new words to the fugashi dictionary

Question

I'm using fugashi to extract words from sentences. How do I add new terms that are not in the fugacy dictionary to the dictionary?

For example, YouTube is divided into "You" and "Tube."

import fugashi
tagger = fugashi.Tagger()
nodes = tagger.parseToNodeList("ユーチューブ")
goodpos = ['名詞']
nodes = [nn.surface for nn in nodes if nn.feature.pos1 in goodpos]

=> ['ユー', 'チューブ']

polm23 · Accepted Answer

I haven't gotten around to making a proper guide for this yet, but basically you should follow the MeCab docs, but you can use fugashi-build-dict instead of mecab-dict-index.

To give brief instructions, first you need to make a CSV file that uses the same format as your system dictionary. This is based on unidic-lite.

令和,4786,4786,8205,名詞,固有名詞,一般,*,*,*,レイワ,令和,令和,レーワ,令和,レーワ,固,*,*,*,*,*,*,*,レイワ,レイワ,レイワ,レイワ,"1,0",*,*,*,*
㋿,5969,5969,2588,補助記号,一般,*,*,*,*,,㋿,㋿,,㋿,,記号,*,*,*,*,*,*,*,,,,,*,*,*,*,999999
㋿,4786,4786,3992,名詞,固有名詞,一般,*,*,*,レイワ,令和,㋿,レーワ,㋿,レーワ,固,*,*,*,*,*,*,*,レイワ,レイワ,レイワ,レイワ,"1,0",*,*,*,*
夢夢,4786,4786,8205,名詞,固有名詞,一般,*,*,*,レイワ,令和,令和,レーワ,令和,レーワ,固,*,*,*,*,*,*,*,レイワ,レイワ,レイワ,レイワ,"1,0",*,*,*,*

You can make this by copying entries from the UniDic source and editing fields. Then you run this command:

fugashi-build-dict -d dicdir/ -u mydic.dic mydic.csv

dicdir is the location of your system dictionary, mydic.csv is the csv file you made. This will create the mydic.dic file, which you can then use with fugashi by specifying -u mydic.dic.

Add new words to the fugashi dictionary

Tags:

text-mining

mecab

Penguin_.

1 Answers

polm23

Recent Activity

Donate For Us

Add new words to the fugashi dictionary

Tags:

text-mining

mecab

Penguin_.

1 Answers

polm23

Related questions

Recent Activity

Donate For Us