Java: Search in HashMap keys based on regex?

Tags:

I'm building a thesaurus using a HashMap to store the synonyms.

I'm trying to search through the words based on a regular expression: the method will have to take a string as parameter and return an array of results. Here's my first stab at it:

public ArrayList<String> searchDefinition(String regex) {
    ArrayList<String> results = new ArrayList<String>();

    Pattern p = Pattern.compile(regex);

    Set<String> keys = thesaurus.keySet();
    Iterator<String> ite = keys.iterator();

    while (ite.hasNext()) {
        String candidate = ite.next();
        Matcher m = p.matcher(candidate);
        System.out.println("Attempting to match: " + candidate + " to "  + regex);
        if (m.matches()) {
            System.out.println("it matches");
            results.add(candidate);
        }
    }   

    if (results.isEmpty()) {
        return null;
    }
    else {
        return results;
    }
}

Now, this does not work as I would expect (or maybe I'm using regular expressions incorrectly). If I have the following keys in the hashmap:

cat, car, chopper

then by calling searchDefinition("c") or searchDefinition("c*") I get null.

How do I make this work as expected?
Is there a better data structure than HashMap to keep a graph like needed by a thesaurus? (curiosity only, as for this assignment we're asked to use Java Collection Map).
Anything else I'm doing innapropriately in the code above?

Thanks, Dan

EDIT: I've corrected the example. It doesn't work even if I use the correct case.

639

asked May 18 '09 20:05

Dan Burzo

2 Answers

You need to specify case insensitivity Pattern.compile( "c",Pattern.CASE_INSENSITIVE ). To find a word with a c in it you need to use matcher.find(). Matcher.matches() tries to match the whole string.

160

answered Oct 16 '22 09:10

Clint

But, hmm:

(a) Why would you use a HashMap if you intend to always search it sequentially? That's a lot of wasted overhead to process the hash keys and all when you never use them. Surely a simple ArrayList or LinkedList would be a better idea.

(b) What does this have to do with a thesaurus? Why would you search a thesaurus using regular expressions? If I want to know synonyms for, say, "cat", I would think that I would search for "cat", not "c.*".

My first thought on how to build a thesaurus would be ... well, I guess the first question I'd ask is, "Is synonym an equivalance relationship?", i.e. if A is a synonym for B, does it follow that B is a synonym for A? And if A is a synonym for B and B is a synonym for C, then is A a synonym for C? Assuming the answers to these questions are "yes", then what we want to build is something that divides all the words in the language into sets of synonyms, so we then can map any word in each set to all the other words in that set. So what you need is a way to take any word, map it to some sort of nexus point, and then go from that nexus point to all of the words that map to it.

This would be straightforward on a database: Just create a table with two columns, say "word" and "token", each with its own index. All synonyms map to the same token. The token can be anything as long as its unique for any given set of synonyms, like a sequence number. Then search for the given word, find the associated token, and then get all the words with that token. For example we might create records with (big,1), (large,1), (gigantic,1), (cat,2), (feline,2), etc. Search for "big" and you get 1, then search for 1 and you get "big", "large", and "giant".

I don't know any class in the built-in Java collections that does this. The easiest way I can think of is to build two co-ordinated hash tables: One that maps words to tokens, and another that maps tokens to an array of words. So table 1 might have big->1, large->1, gigantic->1, cat->2, feline->2, etc. Then table 2 maps 1->[big,large,gigantic], 2->[cat,feline], etc. You look up in the first table to map a word to a token, and in the second to map that token back to a list of words. It's clumsy because all the data is stored redundantly, maybe there's a better solution but I'm not getting it off the top of my head. (Well, it would be easy if we assume that we're going to sequentially search the entire list of words every time, but performance would suck as the list got big.)

answered Oct 16 '22 08:10

Jay

Related questions
                            
                                How to catch all exception in netty
                            
                                How to update xml files in java
                            
                                In java, Can we override a method by passing subclass of the parameter used in super class method?
                            
                                Java 8 lambdas execution
                            
                                How to call a stored procedure in JDBC
                            
                                Bad service configuration file, or exception thrown while constructing Processor object
                            
                                Java JDK BitSet vs Lucene OpenBitSet
                            
                                Error while installing dbeaver - Java error code = 13
                            
                                Eclipse Gradle build "Could not find tools.jar"
                            
                                Thymeleaf Loop Until a Number
                            
                                Adding elements to different collections in a single lambda expression
                            
                                Java - switch case, Multiple cases call the same function
                            
                                Could not resolve substitution to a value: ${akka.stream.materializer} in AWS Lambda
                            
                                Maven is unable to find javadoc command
                            
                                Spring Security >5.0.0 removed Md5PasswordEncoder
                            
                                Unable to send email using mailR package
                            
                                How to merge two streams in Java?
                            
                                Java how to avoid using Thread.sleep() in a loop
                            
                                How to obtain a kerberos service ticket via GSS-API?
                            
                                Changing the values of variables in methods, Java [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Java: Search in HashMap keys based on regex?

Tags:

java

regex

hashmap

Dan Burzo

People also ask

2 Answers

Clint

Jay

Recent Activity

Donate For Us