Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data structure for storing word associations

I'm trying to implement prediction by analyzing sentences. Consider the following [rather boring] sentences

Call ABC
Call ABC again
Call DEF

I'd like to have a data structure for the above sentences as follows:

Call: (ABC, 2), (again, 1), (DEF, 1)
ABC: (Call, 2), (again, 1)
again: (Call, 1), (ABC, 1)
DEF: (Call, 1)

In general, Word: (Word_it_appears_with, Frequency), ....

Please note the inherent redundancy in this type of data. Obviously, if the frequency of ABC is 2 under Call, the frequency of Call is 2 under ABC. How do I optimize this?

The idea is to use this data when a new sentence is being typed. For example, if Call has been typed, from the data, it's easy to say ABC is more likely to be present in the sentence, and offer it as the first suggestion, followed by again and DEF.

I realise this is one of a million possible ways of implementing prediction, and I eagerly look forward to suggestions of other ways to do it.

Thanks

like image 725
WeNeigh Avatar asked Nov 13 '22 14:11

WeNeigh


1 Answers

Maybe using a bidirectional graph. You can store the words as nodes, with edges as frequencies.

like image 54
Mike Dinescu Avatar answered Dec 30 '22 03:12

Mike Dinescu