Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to skip character in capture group

Tags:

regex

Is it possible to skip a couple of characters in a capture group in regular expressions? I am using .NET regexes but that shouldn't matter.

Basically, what I am looking for is:

[random text]AB-123[random text]

and I need to capture 'AB123', without the hyphen.

I know that AB is 2 or 3 uppercase characters and 123 is 2 or 3 digits, but that's not the hard part. The hard part (at least for me) is skipping the hyphen.

I guess I could capture both separately and then concatenate them in code, but I wish I had a more elegant, regex-only solution.

Any suggestions?

like image 495
Tamas Czinege Avatar asked Nov 10 '08 10:11

Tamas Czinege


People also ask

How do capture groups work regex?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d" "o" and "g" .

How do I capture a character in regex?

Parentheses group the regex between them. They capture the text matched by the regex inside them into a numbered group that can be reused with a numbered backreference. They allow you to apply regex operators to the entire grouped regex. (abc){3} matches abcabcabc.

Does need escaping in regex?

In order to use a literal ^ at the start or a literal $ at the end of a regex, the character must be escaped. Some flavors only use ^ and $ as metacharacters when they are at the start or end of the regex respectively. In those flavors, no additional escaping is necessary. It's usually just best to escape them anyway.


2 Answers

In short: You can't. A match is always consecutive, even when it contains things as zero-width assertions there is no way around matching the next character if you want to get to the one after it.

like image 142
Tomalak Avatar answered Oct 07 '22 08:10

Tomalak


There really isn't a way to create an expression such that the matched text is different than what is found in the source text. You will need to remove the hyphen in a separate step either by matching the first and second parts individually and concatenating the two groups:

match = Regex.Match( text, "([A-B]{2,3})-([0-9]{2,3})" ); matchedText = string.Format( "{0}{1}",      match.Groups.Item(1).Value,      match.Groups.Item(2).Value ); 

Or by removing the hyphen in a step separate from the matching process:

match = Regex.Match( text, "[A-B]{2,3}-[0-9]{2,3}" ); matchedText = match.Value.Replace( "-", "" ); 
like image 43
Jeff Hillman Avatar answered Oct 07 '22 08:10

Jeff Hillman