Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the effective method to handle word contractions using Java?

Tags:

java

solr

nlp

I have a list of words in a file. They might contain words like who's, didn't etc. So when reading from it I need to make them proper like "who is" and "did not". This has to be done in Java. I need to do this without losing much time.

This is actually for handling such queries during a search that uses solr.

Below is a sample code I tried using a hash map

Map<String, String> con = new HashMap<String, String>();
        con.put("'s", " is");
        con.put("'d", " would");
        con.put("'re", " are");
        con.put("'ll", " will");
        con.put("n't", " not");
        con.put("'nt", " not");

        String temp = null;
        String str = "where'd you're you'll would'nt hello";

        String[] words = str.split(" ");
        int index = -1 ;
        for(int i = 0;i<words.length && (index =words[i].lastIndexOf('\''))>-1;i++){
            temp = words[i].substring(index);
            if(con.containsKey(temp)){
                 temp = con.get(temp);
            }
            words[i] = words[i].substring(0, index)+temp;
            System.out.println(words[i]);           
        }
like image 447
Varshith Avatar asked Mar 11 '26 12:03

Varshith


1 Answers

If you are worried about queries containing for eg "who's" finding documents containing for eg "who is" then you should look at using a Stemmer, which is designed exactly for this purpose.

You can easily add a stemmer buy configuring it as a filter in your solr config. See http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

Edit:
A SnowballPorterFilterFactory will probably do the job for you.

like image 87
Qwerky Avatar answered Mar 13 '26 03:03

Qwerky



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!