Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

question about regex in Vim

Tags:

regex

vim

I'm trying to figure out the reason behind some regex comparison results I'm getting in Vim. I'm trying to match strings that begin line with one or more asterisks. Here's how various regex's match the strings:

echo '* text is here' =~ '\^*\*\s'  prints 1 (i.e., MATCH)
echo '* text is here' =~ '^*\*\s'   prints 0 (NO MATCH)

echo '** text is here' =~ '\^*\*\s' (MATCH)
echo '** text is here' =~ '^*\*\s'  (MATCH)

echo '*** text is here' =~ '\^*\*\s' (MATCH)
echo '*** text is here' =~ '^*\*\s'  (NO MATCH)

echo 'text is here' =~ '\^*\*\s' (NO MATCH)
echo 'text is here' =~ '^*\*\s'  (NO MATCH)

echo '*text is here' =~ '\^*\*\s' (NO MATCH)
echo '*text is here' =~ '^*\*\s'  (NO MATCH)

From these results I gather that when the begin of line character (^) is not prepended with a backslash the following * is read as a literal and the backslash_* is also read as a literal. So the result when comparing using no-initial-backslash method matches only string with exactly two asterisks followed by a whitespace.

When the ^-character is prepended with a backslash the first asterisk is a literal asterisk and the backslash-* stands for 'zero or more of preceding character'.

The version with the initial backslash gives me the answers I want; i.e., it matches all-and-only lines beginning with one or more asterisks followed by a whitespace. Why is this? When I look at the Vim documentation it says that \^ stands for a literal ^, not the beginning of a line. I'm sure there's a simple explanation but I can't see it. Thanks for any clarification.

I also notice some similar behavior when typing in this question. That is, the following string has a backslash before the second asterisk that doesn't show up in the text: '^**\s' .

UPDATE: Okay, I think I've grokked Ross' answer and see that the de-anchoring was giving me result I wanted. The de-anchoring is also giving me a result I don't want, namely:

echo 'text* is here' =~ '\^*\*\s' (MATCH)

SO MY QUESTION NOW IS: what regex will match all-and-only lines that begin with one or more asterisks followed by a whitespace? The regex below gets close but fails on the final example:

echo '*** text is here' =~ '^**\s' (MATCH)
echo '* text is here' =~ '^**\s' (MATCH)
echo 'text* is here' =~ '^**\s' (NO MATCH)
echo ' * text is here' =~ '^**\s' (MATCH) -- want a no match here

The version with slash-asterisk as first asterisk doesn't work either (i.e., '^\**\s' ).

FINAL UPDATE: Okay, I think I found the version that works. I don't understand exactly why it works, though. It looks like what I would expect except for the asterisk after the ^ character, but having a repeater after the ^ seems nonsensical:

echo '*** text is here' =~ '^*\**\s' (MATCH)
echo '* text is here' =~ '^*\**\s'   (MATCH)
echo 'text* is here' =~ '^*\**\s'   (NO MATCH)
echo ' * text is here' =~ '^*\**\s' (NO MATCH)
like image 908
Herbert Sitz Avatar asked Feb 26 '23 07:02

Herbert Sitz


1 Answers

Ahh, interesting explanation, but not quite right.

The \^ indeed refers to a literal circumflex.

But * doesn't mean "one or more", it means "zero or more", so \^* simply matches nothing if it needs to in order to make the rest of the expression succeed, and in addition it obviously will "deanchor" the rest of the search making it easier to succeed.

I imagine that with this piece of the puzzle filled in you will have no trouble understanding the rest...

Update: I think the final piece of the puzzle is that vi does something a bit different with out-of-context regex magic characters. If you use one in a context where it can't be magic, you won't get an error like you might with Perl or Ruby, the character simply becomes non-magic. And * doesn't repeat the ^ anchor, so a search like /*/ or /^*/ will look for any actual * or a line beginning with an actual *, respectively.

like image 113
DigitalRoss Avatar answered Mar 08 '23 06:03

DigitalRoss