I'm working on a project where I need to search a paragraph of text for a particular string. However, I don't need an exact match, more of a % match.
For example, here is the paragraph of text I'm searching:
Fluticasone Propionate Nasal Spray, USP 50 mcg per spray is a
corticosteroid indicated for the management of the nasal symptoms of
perennial nonallergic rhinitis in adult and pediatric patients aged 4 years
and older."
And then I'm searching to see if any words in the following lines match the paragraph:
1)Unspecified acute lower respiratory infection
2)Vasomotor rhinitis
3)Allergic rhinitis due to pollen
4)Other seasonal allergic rhinitis
5)Allergic rhinitis due to food
6)Allergic rhinitis due to animal (cat) (dog) hair and dander
7)Other allergic rhinitis
8)"Allergic rhinitis, unspecified"
9)Chronic rhinitis
10)Chronic nasopharyngitis
My initial approach to this was using a boolean and contains:
boolean found = med[x].toLowerCase().contains(condition[y].toLowerCase());
however, the results are negative for each loop through.
The results I expect would be:
1) False
2) True
3) True
4) True
5) True
6) True
7) True
8) True
9) True
10) False
Very new to Java and its methods. Basically if any word in A matches any word in B then flag it as true. How do I do that?
Thanks!
You have to first tokenize one of the strings. What you are doing now is trying to match the whole line.
Something like this should work:
String text = med[x].toLowerCase();
boolean found =
Arrays.stream(condition[y].split(" "))
.map(String::toLowerCase)
.map(s -> s.replaceAll("\\W", "")
.filter(s -> !s.isEmpty())
.anyMatch(text::contains);
I've added the removal of punctuation characters, and any blank strings, so that we don't have false matches on those. (The \\W actually removes characters that are not in [A-Za-z_0-9], but you can change it to whatever you like).
If you need this to be efficient, because you have a lot of text, you might want to turn it around and use a Set which has a faster lookup.
private Stream<String> tokenize(String s) {
return Arrays.stream(s.split(" "))
.map(String::toLowerCase)
.map(s -> s.replaceAll("\\W", "")
.filter(s -> !s.isEmpty());
}
Set<String> words = tokenize(med[x]).collect(Collectors.toSet());
boolean found = tokenize(condition[y]).anyMatch(words::contains);
You might also want to filter out stop words, like to, and etc.
You could use the list here and add an extra filter after the one that checks for blank strings, to check that the string is not a stop word.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With