Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to store and check synonym of string in Java

I'm making a program which can response to what user said, something like chatter bot. But I wonder if I can make it understand if two or more words have the same meaning.

For example, I make it to answer, "yes" when user say "are you scared of the dark?". But "scared", "afraid", and "frightened" have the same meaning. If the user use "afraid" instead of "scared" how the program recognize those two words have equal meaning, hence make the reference to "are you scared of the dark?" question and answer "yes"?

I wonder if I could make array of String like {"hello", "hi", "hey"} or {"afraid", "scared", "frightened"} etc. Thank you for helping.

P.S: the program I wrote doesn't use English language, I'm afraid I can't use library or API because of that, but I have no problem defining the synonym list myself.

like image 852
bronze45 Avatar asked Dec 01 '12 11:12

bronze45


2 Answers

I would at least use the nifty feature known as Object orientation:

public class Word implements Comparable<Word> {
   private String word;

   private TreeSet<Word> synonyms;
   //getter and setter
   public void addSynonym(final Word word) {
       synonyms.add(word);
   }

   @Override
   public int compareTo(final Word other) {
      if (this.word == null) {
          return -1;
      if (other == null || other.getWord() == null) {
         return 1;
      }
      return this.word.compareTo(other.getWord());
   }
}

So we have a Word-class, with a TreeSet (for fast searching) of synonyms. This could be populated for instance from a property file like:

afraid=scared
hello=hey

and all the words could be stored in a TreeSet:

private TreeSet<Word> allWords = new TreeSet<Word>();

String key;
String value;
//loop through all properties
Word word = new Word(key);
Word synonym = new Word(value);

if (allWords.contains(word)) {
    allWords.tailSet(word).first().addSynonym(synonym); //find the word in the set
} else {
    word.addSynonym(synonym);
    allWords.add(word);
}

It would need some improvement, there is a problem with how to store the words, should each word be stored in allWords, or just 1 group of synonyms? And it might be better using some kind of TreeMap, like

final TreeMap<Word, List<Word>> allWords;

but still, might help you in the right direction.. Just from the top of my head anyways..

like image 82
Tobb Avatar answered Oct 05 '22 23:10

Tobb


Best idea for you is to store the synonyms in a textfile (or in a database). After that query the data set and obtain matching results.

Below is a database model digram for it -

Database structure

You can query the refSynomy table to obtain the synonyms.

Queries for the above structure in postgres would be -

CREATE TABLE "testing"."synomy" (
    "idSynomy" int2 NOT NULL,
    "word" text NOT NULL,
    CONSTRAINT "synomy_pkey" PRIMARY KEY ("idSynomy") NOT DEFERRABLE INITIALLY IMMEDIATE
)
WITH (OIDS=FALSE);
ALTER TABLE "testing"."synomy" OWNER TO "dulitharasangawijewantha";
CREATE UNIQUE INDEX "synomy_idSynomy_key" ON "testing"."<table_name>" USING btree("idSynomy" ASC NULLS LAST);

CREATE TABLE "testing"."refSynomy" (
    "idSynomyref" int2 NOT NULL,
    "refSynomy" int2 NOT NULL,
    CONSTRAINT "refSynomy_pkey" PRIMARY KEY ("idSynomyref") NOT DEFERRABLE INITIALLY IMMEDIATE,
    CONSTRAINT "refSynomy" FOREIGN KEY ("refSynomy") REFERENCES "testing"."synomy" ("idSynomy") ON UPDATE NO ACTION ON DELETE NO ACTION NOT DEFERRABLE INITIALLY IMMEDIATE,
    CONSTRAINT "idSynomy" FOREIGN KEY ("idSynomyref") REFERENCES "testing"."synomy" ("idSynomy") ON UPDATE NO ACTION ON DELETE NO ACTION NOT DEFERRABLE INITIALLY IMMEDIATE
)
WITH (OIDS=FALSE);
ALTER TABLE "testing"."refSynomy" OWNER TO "dulitharasangawijewantha";

The reason why I suggest that you should use a small database -

  • Manageable in the later stages of the application
  • Useful if you want to introduce more features such as antonyms
  • Efficient since we are using a database

You can use your initial idea to store them in arrays but soon it would be hard to maintain. So my suggestion is a database. If you want to make your application portable you can go for sqlite solution so that the database lives inside a file. Hope this helps.

like image 29
Chan Avatar answered Oct 06 '22 01:10

Chan