Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to match the middle character in a string with regex?

In an odd number length string, how could you match (or capture) the middle character?

Is this possible with PCRE, plain Perl or Java regex flavors?

With .NET regex you could use balancing groups to solve it easily (that could be a good example). By plain Perl regex I mean not using any code constructs like (??{ ... }), with which you could run any code and of course do anything.

The string could be of any odd number length.

For example in the string 12345 you would want to get the 3, the character at the center of the string.

This is a question about the possibilities of modern regex flavors and not about the best algorithm to do that in some other way.

like image 906
Qtax Avatar asked Jan 20 '15 17:01

Qtax


People also ask

How do I match a specific character in regex?

Special Regex Characters: These characters have special meaning in regex (to be discussed below): . , + , * , ? , ^ , $ , ( , ) , [ , ] , { , } , | , \ . Escape Sequences (\char): To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \.

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string). Both are called anchors and ensure that the entire string is matched instead of just a substring.

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .


2 Answers

With PCRE and Perl (and probably Java) you could use:

^(?:.(?=.*?(?(1)(?=.\1$))(.\1?$)))*(.)

which would capture the middle character of odd length strings in the 2nd capturing group.

Explained:

^ # beginning of the string
(?: # loop
  . # match a single character
  (?=
    # non-greedy lookahead to towards the end of string
    .*?
    # if we already have captured the end of the string (skip the first iteration)
    (?(1)
      # make sure we do not go past the correct position
      (?= .\1$ )
    )
    # capture the end of the string +1 character, adding to \1 every iteration
    ( .\1?$ )
  )
)* # repeat
# the middle character follows, capture it
(.)
like image 100
Qtax Avatar answered Nov 01 '22 11:11

Qtax


Hmm, maybe someone can come up with a pure regex solution, but if not you could always dynamically build the regex like this:

public static void main(String[] args) throws Exception {
    String s = "12345";
    String regex = String.format(".{%d}3.{%d}", s.length() / 2, s.length() / 2);
    Pattern p = Pattern.compile(regex);
    System.out.println(p.matcher(s).matches());
}
like image 2
BarrySW19 Avatar answered Nov 01 '22 10:11

BarrySW19