I'm making a program which can response to what user said, something like chatter bot. But I wonder if I can make it understand if two or more words have the same meaning.
For example, I make it to answer, "yes" when user say "are you scared of the dark?". But "scared", "afraid", and "frightened" have the same meaning. If the user use "afraid" instead of "scared" how the program recognize those two words have equal meaning, hence make the reference to "are you scared of the dark?" question and answer "yes"?
I wonder if I could make array of String like {"hello", "hi", "hey"}
or {"afraid", "scared", "frightened"}
etc. Thank you for helping.
P.S: the program I wrote doesn't use English language, I'm afraid I can't use library or API because of that, but I have no problem defining the synonym list myself.
I would at least use the nifty feature known as Object orientation:
public class Word implements Comparable<Word> {
private String word;
private TreeSet<Word> synonyms;
//getter and setter
public void addSynonym(final Word word) {
synonyms.add(word);
}
@Override
public int compareTo(final Word other) {
if (this.word == null) {
return -1;
if (other == null || other.getWord() == null) {
return 1;
}
return this.word.compareTo(other.getWord());
}
}
So we have a Word-class, with a TreeSet (for fast searching) of synonyms. This could be populated for instance from a property file like:
afraid=scared
hello=hey
and all the words could be stored in a TreeSet:
private TreeSet<Word> allWords = new TreeSet<Word>();
String key;
String value;
//loop through all properties
Word word = new Word(key);
Word synonym = new Word(value);
if (allWords.contains(word)) {
allWords.tailSet(word).first().addSynonym(synonym); //find the word in the set
} else {
word.addSynonym(synonym);
allWords.add(word);
}
It would need some improvement, there is a problem with how to store the words, should each word be stored in allWords, or just 1 group of synonyms? And it might be better using some kind of TreeMap, like
final TreeMap<Word, List<Word>> allWords;
but still, might help you in the right direction.. Just from the top of my head anyways..
Best idea for you is to store the synonyms in a textfile (or in a database). After that query the data set and obtain matching results.
Below is a database model digram for it -
You can query the refSynomy table to obtain the synonyms.
Queries for the above structure in postgres would be -
CREATE TABLE "testing"."synomy" (
"idSynomy" int2 NOT NULL,
"word" text NOT NULL,
CONSTRAINT "synomy_pkey" PRIMARY KEY ("idSynomy") NOT DEFERRABLE INITIALLY IMMEDIATE
)
WITH (OIDS=FALSE);
ALTER TABLE "testing"."synomy" OWNER TO "dulitharasangawijewantha";
CREATE UNIQUE INDEX "synomy_idSynomy_key" ON "testing"."<table_name>" USING btree("idSynomy" ASC NULLS LAST);
CREATE TABLE "testing"."refSynomy" (
"idSynomyref" int2 NOT NULL,
"refSynomy" int2 NOT NULL,
CONSTRAINT "refSynomy_pkey" PRIMARY KEY ("idSynomyref") NOT DEFERRABLE INITIALLY IMMEDIATE,
CONSTRAINT "refSynomy" FOREIGN KEY ("refSynomy") REFERENCES "testing"."synomy" ("idSynomy") ON UPDATE NO ACTION ON DELETE NO ACTION NOT DEFERRABLE INITIALLY IMMEDIATE,
CONSTRAINT "idSynomy" FOREIGN KEY ("idSynomyref") REFERENCES "testing"."synomy" ("idSynomy") ON UPDATE NO ACTION ON DELETE NO ACTION NOT DEFERRABLE INITIALLY IMMEDIATE
)
WITH (OIDS=FALSE);
ALTER TABLE "testing"."refSynomy" OWNER TO "dulitharasangawijewantha";
The reason why I suggest that you should use a small database -
You can use your initial idea to store them in arrays but soon it would be hard to maintain. So my suggestion is a database. If you want to make your application portable you can go for sqlite solution so that the database lives inside a file. Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With