I want to use the synonym
tokenfilter in Elasticsearch for an index. I downloaded the Prolog version of WordNet 3.0, and found the wn_s.pl
file that Elasticsearch can understand. However, it seems that the file contains synonyms for all sorts of words and phrases, while I am really only interested in supporting synonyms for nouns. Is there a way to extract those type of entries?
Given that the format of wn_s.pl
is
s(112947045,1,'usance',n,1,0).
s(200001742,1,'breathe',v,1,25).
A very raw way of doing that would be to execute the following in your terminal to only take the lines from that file that have the ',n,' string.
grep ",n," wn_s.pl > wn_s_nouns_only.pl
The file wn_s_nouns_only.pl
will only have the entries that are marked as nouns.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With