Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Searching String Contents for partial match

I'm working on a project where I need to search a paragraph of text for a particular string. However, I don't need an exact match, more of a % match.

For example, here is the paragraph of text I'm searching:

Fluticasone Propionate Nasal Spray, USP 50 mcg per spray is a 
corticosteroid indicated for the management of the nasal symptoms of 
perennial nonallergic rhinitis in adult and pediatric patients aged 4 years 
and older."

And then I'm searching to see if any words in the following lines match the paragraph:

1)Unspecified acute lower respiratory infection
2)Vasomotor rhinitis
3)Allergic rhinitis due to pollen
4)Other seasonal allergic rhinitis
5)Allergic rhinitis due to food
6)Allergic rhinitis due to animal (cat) (dog) hair and dander
7)Other allergic rhinitis
8)"Allergic rhinitis, unspecified"
9)Chronic rhinitis
10)Chronic nasopharyngitis

My initial approach to this was using a boolean and contains:

boolean found = med[x].toLowerCase().contains(condition[y].toLowerCase());

however, the results are negative for each loop through.

The results I expect would be:

1) False
2) True
3) True
4) True
5) True
6) True
7) True
8) True
9) True
10) False

Very new to Java and its methods. Basically if any word in A matches any word in B then flag it as true. How do I do that?

Thanks!

like image 366
Fxguy1 Avatar asked Dec 06 '25 18:12

Fxguy1


1 Answers

You have to first tokenize one of the strings. What you are doing now is trying to match the whole line.

Something like this should work:

String text = med[x].toLowerCase();
boolean found = 
  Arrays.stream(condition[y].split(" "))      
      .map(String::toLowerCase)
      .map(s -> s.replaceAll("\\W", "")
      .filter(s -> !s.isEmpty())
      .anyMatch(text::contains);

I've added the removal of punctuation characters, and any blank strings, so that we don't have false matches on those. (The \\W actually removes characters that are not in [A-Za-z_0-9], but you can change it to whatever you like).

If you need this to be efficient, because you have a lot of text, you might want to turn it around and use a Set which has a faster lookup.

private Stream<String> tokenize(String s) {
   return Arrays.stream(s.split(" "))
                .map(String::toLowerCase)
                .map(s -> s.replaceAll("\\W", "")
                .filter(s -> !s.isEmpty());                   
}

Set<String> words =  tokenize(med[x]).collect(Collectors.toSet());

boolean found = tokenize(condition[y]).anyMatch(words::contains);

You might also want to filter out stop words, like to, and etc. You could use the list here and add an extra filter after the one that checks for blank strings, to check that the string is not a stop word.

like image 55
jbx Avatar answered Dec 09 '25 12:12

jbx