I'm using capturing groups in regular expressions for the first time and I'm wondering what my problem is, as I assume that the regex engine looks through the string left-to-right. I'm trying to convert an UpperCamelCase string into a hyphened-lowercase-string, so for example: <pre class="prettyprint lang-none prettyprint-override"><code>HelloWorldThisIsATest => hello-world-this-is-a-test </code></pre> My precondition is an alphabetic string, so I don't need to worry about numbers or other characters. Here is what I tried: <pre class="prettyprint lang-php prettyprint-override"><code>mb_strtolower(preg_replace('/([A-Za-z])([A-Z])/', '$1-$2', "HelloWorldThisIsATest")); </code></pre> The result: <pre class="prettyprint lang-none prettyprint-override"><code>hello-world-this-is-atest </code></pre> This is almost what I want, except there should be a hyphen between <code>a</code> and <code>test</code>. I've already included <code>A-Z</code> in my first capturing group so I would assume that the engine sees <code>AT</code> and hyphenates that. What am I doing wrong?

The Reason your Regex will Not Work: Overlapping Matches <ul> <li>Your regex matches <code>sA</code> in <code>IsATest</code>, allowing you to insert a <code>-</code> between the <code>s</code> and the <code>A</code> </li> <li>In order to insert a <code>-</code> between the <code>A</code> and the <code>T</code>, the regex would have to match <code>AT</code>. </li> <li>This is impossible because the <code>A</code> is already matched as part of <code>sA</code>. You cannot have overlapping matches in direct regex.</li> <li>Is all hope lost? No! This is a perfect situation for lookarounds. </li> </ul> Do it in Two Easy Lines Here's the easy way to do it with regex: <pre class="prettyprint"><code>$regex = '~(?<=[a-zA-Z])(?=[A-Z])~'; echo strtolower(preg_replace($regex,"-","HelloWorldThisIsATest")); </code></pre> See the output at the bottom of the php demo: <blockquote> Output: <code>hello-world-this-is-a-test</code> </blockquote> Will add explanation in a moment. :) <ul> <li>The regex doesn't match any characters. Rather, it targets positions in the string: the positions between the change in letter case. To do so, it uses a lookbehind and a lookahead</li> <li>The <code>(?<=[a-zA-Z])</code> lookbehind asserts that what precedes the current position is a letter</li> <li>The <code>(?=[A-Z])</code> lookahead asserts that what follows the current position is an upper-case letter.</li> <li>We just replace these positions with a <code>-</code>, and convert the lot to lowercase.</li> </ul> If you look carefully on this regex101 screen, you can see lines between the words, where the regex matches. Reference <ul> <li>Lookahead and Lookbehind Zero-Length Assertions</li> <li>Mastering Lookahead and Lookbehind</li> </ul>

PHP regex and adjacent capturing groups

I'm using capturing groups in regular expressions for the first time and I'm wondering what my problem is, as I assume that the regex engine looks through the string left-to-right.

I'm trying to convert an UpperCamelCase string into a hyphened-lowercase-string, so for example:

HelloWorldThisIsATest => hello-world-this-is-a-test

My precondition is an alphabetic string, so I don't need to worry about numbers or other characters. Here is what I tried:

mb_strtolower(preg_replace('/([A-Za-z])([A-Z])/', '$1-$2', "HelloWorldThisIsATest"));

The result:

hello-world-this-is-atest

This is almost what I want, except there should be a hyphen between a and test. I've already included A-Z in my first capturing group so I would assume that the engine sees AT and hyphenates that.

What am I doing wrong?

How do you repeat a group in regex?

"Capturing a repeated group captures all iterations." In your regex101 try to replace your regex with (\w+),? and it will give you the same result. The key here is the g flag which repeats your pattern to match into multiple groups.

What is the use of \1 in regex?

The backreference \1 (backslash one) references the first capturing group. \1 matches the exact same text that was matched by the first capturing group. The / before it is a literal character. It is simply the forward slash in the closing HTML tag that we are trying to match.

What will the \$ regular expression match?

\$ will help to find the character "$" available in the content based on the expression flags assigned to the regular expression. Say for example: \$: only find the single "$" in a content \$/g: find the "$" globally available in content.

The Reason your Regex will Not Work: Overlapping Matches

Your regex matches sA in IsATest, allowing you to insert a - between the s and the A
In order to insert a - between the A and the T, the regex would have to match AT.
This is impossible because the A is already matched as part of sA. You cannot have overlapping matches in direct regex.
Is all hope lost? No! This is a perfect situation for lookarounds.

Do it in Two Easy Lines

Here's the easy way to do it with regex:

$regex = '~(?<=[a-zA-Z])(?=[A-Z])~';
echo strtolower(preg_replace($regex,"-","HelloWorldThisIsATest"));

See the output at the bottom of the php demo:

Output: hello-world-this-is-a-test

Will add explanation in a moment. :)

The regex doesn't match any characters. Rather, it targets positions in the string: the positions between the change in letter case. To do so, it uses a lookbehind and a lookahead
The (?<=[a-zA-Z]) lookbehind asserts that what precedes the current position is a letter
The (?=[A-Z]) lookahead asserts that what follows the current position is an upper-case letter.
We just replace these positions with a -, and convert the lot to lowercase.

If you look carefully on this regex101 screen, you can see lines between the words, where the regex matches.

Reference

Lookahead and Lookbehind Zero-Length Assertions
Mastering Lookahead and Lookbehind

PHP regex and adjacent capturing groups

Tags:

string

regex

php

camelcasing

backreference

rink.attendant.6

People also ask

1 Answers

zx81

Recent Activity

Donate For Us

PHP regex and adjacent capturing groups

Tags:

string

regex

php

camelcasing

backreference

rink.attendant.6

People also ask

1 Answers

zx81

Related questions

Recent Activity

Donate For Us