Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex in java to find pattern like ${...} from given string

Tags:

java

regex

I am a noob to regex.

I have string like:-

String str = "sbs 01.00 sip ${dreamworks.values} print ${fwVer} to 
              used ${lang} en given ${model}  in ${region}";

and i have to extract all patterns matched with this type ${....}

Like:- for given str result should be

${dreamworks.values} 
${fwVer}   
${lang}
${model}
${region}

further if it finds any duplicates then gives only one . for ex:-

String feed = "sip ${dreamworks.values} print ${fwVer} to ${fwVer} used
                ${lang} en ${lang}given ${model}  in ${region}"

result should be:-

 ${dreamworks.values}  
 ${fwVer}   
 ${lang}
 ${model}
 ${region}  

only

this is my answer:-

PLACEHOLDER_PATTERN = "\\$\\{\\w+\\}";

but this one not giving the correct result. it gives only

${fwVer}
${lang}
${model}
${region}

So please suggest me correct regex.

like image 388
user1808653 Avatar asked Nov 08 '12 09:11

user1808653


2 Answers

You are not considering the . in between the word. \\w does not include the dot(.).

You need to change your pattern to: -

PLACEHOLDER_PATTERN = "\\$\\{.+?\\}";

dot(.) matches everything, and that is what you want right?

Also, I have used here reluctant quantifier - .+? so that it only matches the first } after {, since if you use a greedy quantifier (.+), dot(.) will also match the } in the way till it finds the last }.


UPDATE: -

To get just the unique values, you can use this pattern: -

"(\\$\\{[^}]+\\})(?!.*?\\1)"

It will match only those pattern, which is not followed by the string containing the same pattern.

NOTE: - Here, I have used [^}], in place of .+?. It will match any character except }. So, now in this case, you don't need a reluctant quantifier.

\1 is used for backreferencing, but we need to escape it with a backslash, and hence \\1, and (?!...) is used for negative look ahead.

like image 77
Rohit Jain Avatar answered Oct 06 '22 14:10

Rohit Jain


Thats is, because the . is not included in \w. You need to create your own character class then and add it there.

PLACEHOLDER_PATTERN = "\\$\\{[\\w.]+\\}";

See the pattern here on Regexr.

However, this does not solve the problem, that you want no duplicates, but that is not a job for regular expressions.

If there could be more different characters between the curly brackets, then Rohits answer is better, that would match any characters till the closing bracket.

like image 25
stema Avatar answered Oct 06 '22 15:10

stema