Data structure for storing word associations

Question

I'm trying to implement prediction by analyzing sentences. Consider the following [rather boring] sentences

Call ABC
Call ABC again
Call DEF

I'd like to have a data structure for the above sentences as follows:

Call: (ABC, 2), (again, 1), (DEF, 1)
ABC: (Call, 2), (again, 1)
again: (Call, 1), (ABC, 1)
DEF: (Call, 1)

In general, Word: (Word_it_appears_with, Frequency), ....

Please note the inherent redundancy in this type of data. Obviously, if the frequency of ABC is 2 under Call, the frequency of Call is 2 under ABC. How do I optimize this?

The idea is to use this data when a new sentence is being typed. For example, if Call has been typed, from the data, it's easy to say ABC is more likely to be present in the sentence, and offer it as the first suggestion, followed by again and DEF.

I realise this is one of a million possible ways of implementing prediction, and I eagerly look forward to suggestions of other ways to do it.

Thanks

Mike Dinescu · Accepted Answer

Maybe using a bidirectional graph. You can store the words as nodes, with edges as frequencies.

Data structure for storing word associations

Tags:

java

string

data-structures

artificial-intelligence

prediction

WeNeigh

1 Answers

Mike Dinescu

Recent Activity

Donate For Us

Data structure for storing word associations

Tags:

java

string

data-structures

artificial-intelligence

prediction

WeNeigh

1 Answers

Mike Dinescu

Related questions

Recent Activity

Donate For Us