Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python data structure for thesaurus

I need to have synonyms that I define for about 100 words of my of my choice. For testing I am adding the entries manually:

t = {}
t.update({'Strong':['Strong', 'Able', 'Active', 'Big',
                    'Energy', 'Firm',
                    'Force', 'Heavy', 'Robust', 'Secure',
                    'Solid', 'Stable', 'Steady',
                    'Tough', 'Vigor', 'Might',
                    'Rugged', 'Sound']})

t.update({'Fast':['Fast', 'Agile', 'Brisk', 'Hot', 'Quick',
              'Rapid', 'Swift', 'Accel', 'Active',
              'Dash', 'Flash', 'Fly', 'Race', 'Snap',
              'Wing', 'Streak', 'Time', 'Chop', 'Jiffy',
              'Split', 'Bat', 'Crazy', 'Double', 'Scream',
              'Sonic', 'Super', 'Ball', 'Speed']})

So I am creating an empty dictionary, and then taking words like "Strong" and "Fast" and mapping it to synonyms (which I need to be able to choose).

Since I need only 100 different word mappings is this a reasonable approach? Or is there a better way to implement this?

I am also looking at using NLTK and the wordnet module. However, this module takes awhile to run and it seems I have no way of adding synonyms like I need.

like image 469
William Ross Avatar asked Oct 20 '25 13:10

William Ross


2 Answers

I could organize your thesaurus in a graph fashion. First of all, you keep all the words in a dictionary word -> key and then you make a linked-list graph, since it will be sparse.

w = {}
w = {'Fast': 0, 'Strong': 1, 'Able': 2, 'Active': 3, 'Big': 4, ...}

t = {0: [1, 2, 3, ...], ...}

It would scale better for large data sets, since ints use less memory than strings.

like image 186
thyago stall Avatar answered Oct 22 '25 02:10

thyago stall


In an actual thesaurus, individual words may belong to multiple sets of synonyms. For example, fast as in quick might be one list while fast as in secure might be in another.

I would map each word to a list of "sense groups," and then each sense group would map to a list of words.

like image 37
Adrian McCarthy Avatar answered Oct 22 '25 01:10

Adrian McCarthy