Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to match the beginning and the end of a string in Java

Tags:

java

regex

I want to extract a certain like of string using Regex in Java. I currently have this pattern:

pattern = "^\\a.+\\sed$\n";

Supposed to match on a string that starts with "a" and ends with "sed". This is not working. Did I miss something ?

Removed the \n line at the end of the pattern and replaced it with a "$": Still doesn't get a match. The regex looks legit from my side.

What I want to extract is the "a sed" from the temp string.

String temp = "afsgdhgd gfgshfdgadh a sed afdsgdhgdsfgdfagdfhh";
                pattern = "(?s)^a.*sed$";
                       pr = Pattern.compile(pattern);

                math = pr.matcher(temp);
like image 342
Sandeep Shah Avatar asked Dec 16 '15 11:12

Sandeep Shah


2 Answers

UPDATE

You want to match a sed, so you can use a\\s+sed if there is only whitespace between a and sed:

String s = "afsgdhgd gfgshfdgadh a sed afdsgdhgdsfgdfagdfhh";
Pattern pattern = Pattern.compile("a\\s+sed");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
    System.out.println(matcher.group(0)); 
} 

See IDEONE demo

Now, if there can be anything between a and sed, use a tempered greedy token:

Pattern pattern = Pattern.compile("(?s)a(?:(?!a|sed).)*sed");
                                         ^^^^^^^^^^^^^  

See another IDEONE demo.

ORIGINAL ANSWER

The main problem with your regex is the \n at the end. $ is the end of string, and you try to match one more character after a string end, which is impossible. Also, \\s matches a whitespace symbol, but you need a literal s.

You need to remove \\s and \n and make . match a newline, and also it is advisbale to use * quantifier to allow 0 symbols in-between:

pattern = "(?s)^a.*sed$";

See the regex demo

The regex matches:

  • ^ - start of string
  • a - a literal a
  • .* - 0 or more any characters (since (?s) modifier makes a . match any character including a newline)
  • sed - a literal letter sequence sed
  • $ - end of string
like image 188
Wiktor Stribiżew Avatar answered Sep 28 '22 13:09

Wiktor Stribiżew


Your temp string cannot match the pattern (?s)^a.*sed$, because this pattern says that your temp string must begin with the character a and end with the sequence sed, which is not the case. Your string has trailing characters after the "sed" sequence. If you only want to extract that a...sed portion of the whole string, try using the unanchored pattern "a.*sed" and use the find() method of the Matcher class:

Pattern pattern = Pattern.compile("a.*sed");
Matcher m = pattern.matcher(temp);
if (m.find())
{
    System.out.println("Found string "+m.group());
    System.out.println("From "+m.start()+" to "+m.end());
}
like image 41
ragelh Avatar answered Sep 28 '22 13:09

ragelh