I'm using capturing groups in regular expressions for the first time and I'm wondering what my problem is, as I assume that the regex engine looks through the string left-to-right.
I'm trying to convert an UpperCamelCase string into a hyphened-lowercase-string, so for example:
HelloWorldThisIsATest => hello-world-this-is-a-test
My precondition is an alphabetic string, so I don't need to worry about numbers or other characters. Here is what I tried:
mb_strtolower(preg_replace('/([A-Za-z])([A-Z])/', '$1-$2', "HelloWorldThisIsATest"));
The result:
hello-world-this-is-atest
This is almost what I want, except there should be a hyphen between a
and test
. I've already included A-Z
in my first capturing group so I would assume that the engine sees AT
and hyphenates that.
What am I doing wrong?
"Capturing a repeated group captures all iterations." In your regex101 try to replace your regex with (\w+),? and it will give you the same result. The key here is the g flag which repeats your pattern to match into multiple groups.
The backreference \1 (backslash one) references the first capturing group. \1 matches the exact same text that was matched by the first capturing group. The / before it is a literal character. It is simply the forward slash in the closing HTML tag that we are trying to match.
\$ will help to find the character "$" available in the content based on the expression flags assigned to the regular expression. Say for example: \$: only find the single "$" in a content \$/g: find the "$" globally available in content.
The Reason your Regex will Not Work: Overlapping Matches
sA
in IsATest
, allowing you to insert a -
between the s
and the A
-
between the A
and the T
, the regex would have to match AT
. A
is already matched as part of sA
. You cannot have overlapping matches in direct regex.Do it in Two Easy Lines
Here's the easy way to do it with regex:
$regex = '~(?<=[a-zA-Z])(?=[A-Z])~';
echo strtolower(preg_replace($regex,"-","HelloWorldThisIsATest"));
See the output at the bottom of the php demo:
Output:
hello-world-this-is-a-test
Will add explanation in a moment. :)
(?<=[a-zA-Z])
lookbehind asserts that what precedes the current position is a letter(?=[A-Z])
lookahead asserts that what follows the current position is an upper-case letter.-
, and convert the lot to lowercase.If you look carefully on this regex101 screen, you can see lines between the words, where the regex matches.
Reference
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With