I have an input of dictionary. The dictionary is iterated over to replace the key
from dictionary in the text. But replaceAll
function replaces the subString
as well.
How to ensure that it will match the whole word (as a whole and not as a subString
)
String text= "Synthesis of 1-(2,6-dimethylbenzyl)-1H-indole-6-carboxylic acid [69-3] The titled compound (883 mg) sdvfshd[69-3]3456 as a white solid was prepared"
dictionary= {[69-3]=1-(2,6-dimethylbenzyl)-1H-indole-6-carboxylic acid }
for(Map.Entry<String, String> entry : dictionary.entrySet()){
text=text.replaceAll("\\b"+Pattern.quote(entry.getKey())+"\\b", entry.getValue());
}
replaceAll
takes as parameter a regular expression.
In regular expressions, you have word boundaries : \b
(use \\b
in a string literal). They're the best way to ensure you're matching a word and not a part of a word : "\\bword\\b"
But in your case, you can't use word boundaries as you're not looking for a word ([69-3]
isn't a word).
I suggest this :
text=text.replaceAll("(?=\\W+|^)"+Pattern.quote("[69-3]")+"(?=\\W+|$)", ...
The idea is to match a string end or something that's not a word. I can't ensure this will be the right solution for you though : such a pattern must be tuned knowing the exact complete use case.
Note that if all your keys follow a similar pattern there might be a better solution than to iterate through a dictionary, you might for example use a pattern like "(?=\\W+|^)\\[\\d+\\-\\d+\\](?=\\W+|$)"
.
"\bword\b" is working for me.
Sample Code :
for (row <- df.rdd.collect){
var config_key = row.mkString(",").split(",")(0)
var config_value = row.mkString(",").split(",")(1)
val rc_applied_hiveQuery="select * from emp_details_Spark2 where empid_details= 'empid' limit 10"
var str_row = rc_applied_hiveQuery.replaceAll("\\b"+config_key+"\\b", "xyz")
println(str_row)}
Output : select * from emp_details_Spark2 where empid_details= '5' limit 10
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With