Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to ensure replaceAll will replace a whole word and not a subString

I have an input of dictionary. The dictionary is iterated over to replace the key from dictionary in the text. But replaceAll function replaces the subString as well.

How to ensure that it will match the whole word (as a whole and not as a subString)

String text= "Synthesis of 1-(2,6-dimethylbenzyl)-1H-indole-6-carboxylic acid [69-3] The titled compound (883 mg) sdvfshd[69-3]3456 as a white solid was prepared"

dictionary= {[69-3]=1-(2,6-dimethylbenzyl)-1H-indole-6-carboxylic acid }

for(Map.Entry<String, String> entry : dictionary.entrySet()){

        text=text.replaceAll("\\b"+Pattern.quote(entry.getKey())+"\\b", entry.getValue());

} 
like image 607
user2832203 Avatar asked Sep 09 '14 06:09

user2832203


2 Answers

replaceAll takes as parameter a regular expression.

In regular expressions, you have word boundaries : \b (use \\b in a string literal). They're the best way to ensure you're matching a word and not a part of a word : "\\bword\\b"

But in your case, you can't use word boundaries as you're not looking for a word ([69-3] isn't a word).

I suggest this :

text=text.replaceAll("(?=\\W+|^)"+Pattern.quote("[69-3]")+"(?=\\W+|$)", ...

The idea is to match a string end or something that's not a word. I can't ensure this will be the right solution for you though : such a pattern must be tuned knowing the exact complete use case.

Note that if all your keys follow a similar pattern there might be a better solution than to iterate through a dictionary, you might for example use a pattern like "(?=\\W+|^)\\[\\d+\\-\\d+\\](?=\\W+|$)".

like image 58
Denys Séguret Avatar answered Sep 21 '22 06:09

Denys Séguret


"\bword\b" is working for me.

Sample Code :

for (row <- df.rdd.collect){   
var config_key = row.mkString(",").split(",")(0)
var config_value = row.mkString(",").split(",")(1)
val rc_applied_hiveQuery="select * from emp_details_Spark2 where empid_details= 'empid' limit 10"
var str_row = rc_applied_hiveQuery.replaceAll("\\b"+config_key+"\\b", "xyz")
println(str_row)}

Output : select * from emp_details_Spark2 where empid_details= '5' limit 10

like image 22
Ashish Avatar answered Sep 18 '22 06:09

Ashish