I'm trying to implement prediction by analyzing sentences. Consider the following [rather boring] sentences
Call ABC
Call ABC again
Call DEF
I'd like to have a data structure for the above sentences as follows:
Call: (ABC, 2), (again, 1), (DEF, 1)
ABC: (Call, 2), (again, 1)
again: (Call, 1), (ABC, 1)
DEF: (Call, 1)
In general, Word: (Word_it_appears_with, Frequency), ....
Please note the inherent redundancy in this type of data. Obviously, if the frequency of ABC
is 2 under Call
, the frequency of Call
is 2 under ABC
. How do I optimize this?
The idea is to use this data when a new sentence is being typed. For example, if Call
has been typed, from the data, it's easy to say ABC
is more likely to be present in the sentence, and offer it as the first suggestion, followed by again and DEF
.
I realise this is one of a million possible ways of implementing prediction, and I eagerly look forward to suggestions of other ways to do it.
Thanks
Maybe using a bidirectional graph. You can store the words as nodes, with edges as frequencies.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With