I'm trying to figure out the reason behind some regex comparison results I'm getting in Vim. I'm trying to match strings that begin line with one or more asterisks. Here's how various regex's match the strings:
echo '* text is here' =~ '\^*\*\s' prints 1 (i.e., MATCH)
echo '* text is here' =~ '^*\*\s' prints 0 (NO MATCH)
echo '** text is here' =~ '\^*\*\s' (MATCH)
echo '** text is here' =~ '^*\*\s' (MATCH)
echo '*** text is here' =~ '\^*\*\s' (MATCH)
echo '*** text is here' =~ '^*\*\s' (NO MATCH)
echo 'text is here' =~ '\^*\*\s' (NO MATCH)
echo 'text is here' =~ '^*\*\s' (NO MATCH)
echo '*text is here' =~ '\^*\*\s' (NO MATCH)
echo '*text is here' =~ '^*\*\s' (NO MATCH)
From these results I gather that when the begin of line character (^) is not prepended with a backslash the following * is read as a literal and the backslash_* is also read as a literal. So the result when comparing using no-initial-backslash method matches only string with exactly two asterisks followed by a whitespace.
When the ^-character is prepended with a backslash the first asterisk is a literal asterisk and the backslash-* stands for 'zero or more of preceding character'.
The version with the initial backslash gives me the answers I want; i.e., it matches all-and-only lines beginning with one or more asterisks followed by a whitespace. Why is this? When I look at the Vim documentation it says that \^ stands for a literal ^, not the beginning of a line. I'm sure there's a simple explanation but I can't see it. Thanks for any clarification.
I also notice some similar behavior when typing in this question. That is, the following string has a backslash before the second asterisk that doesn't show up in the text: '^**\s' .
UPDATE: Okay, I think I've grokked Ross' answer and see that the de-anchoring was giving me result I wanted. The de-anchoring is also giving me a result I don't want, namely:
echo 'text* is here' =~ '\^*\*\s' (MATCH)
SO MY QUESTION NOW IS: what regex will match all-and-only lines that begin with one or more asterisks followed by a whitespace? The regex below gets close but fails on the final example:
echo '*** text is here' =~ '^**\s' (MATCH)
echo '* text is here' =~ '^**\s' (MATCH)
echo 'text* is here' =~ '^**\s' (NO MATCH)
echo ' * text is here' =~ '^**\s' (MATCH) -- want a no match here
The version with slash-asterisk as first asterisk doesn't work either (i.e., '^\**\s' ).
FINAL UPDATE: Okay, I think I found the version that works. I don't understand exactly why it works, though. It looks like what I would expect except for the asterisk after the ^ character, but having a repeater after the ^ seems nonsensical:
echo '*** text is here' =~ '^*\**\s' (MATCH)
echo '* text is here' =~ '^*\**\s' (MATCH)
echo 'text* is here' =~ '^*\**\s' (NO MATCH)
echo ' * text is here' =~ '^*\**\s' (NO MATCH)
Ahh, interesting explanation, but not quite right.
The \^
indeed refers to a literal circumflex.
But *
doesn't mean "one or more", it means "zero or more", so \^*
simply matches nothing if it needs to in order to make the rest of the expression succeed, and in addition it obviously will "deanchor" the rest of the search making it easier to succeed.
I imagine that with this piece of the puzzle filled in you will have no trouble understanding the rest...
Update: I think the final piece of the puzzle is that vi does something a bit different with out-of-context regex magic characters. If you use one in a context where it can't be magic, you won't get an error like you might with Perl or Ruby, the character simply becomes non-magic. And *
doesn't repeat the ^
anchor, so a search like /*/
or /^*/
will look for any actual *
or a line beginning with an actual *
, respectively.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With