Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is \%(\) faster than \(\) in Vim?

Tags:

regex

vim

I am confused by the docs:

\%(\) A pattern enclosed by escaped parentheses. */\%(\)* */\%(* *E53* Just like \(\), but without counting it as a sub-expression. This allows using more groups and it's a little bit faster.

Can someone explain the reason for the difference? Is it because of backtracking or something else?

like image 680
Léo Léopold Hertz 준영 Avatar asked Apr 11 '09 22:04

Léo Léopold Hertz 준영


People also ask

Does Vim make you code faster?

Vim is a text editor that in the hands of a skilled user, can enable blazing fast edits closer to the speed of thought — much faster than what's usually achievable with a traditional text editor.

Why is Vim fast?

A key reason why Vim is used is speed. Vim allows you to be incredibly fast when navigating and editing files. Vim also provides freedom and flexibility. You can customize a vimrc file to override defaults such as default indenting and syntax highlighting defaults.

Why NeoVim is faster than Vim?

Vim's plugin API is restrictive and cumbersome. The plugin architecture of NeoVim is a lot better than Vim. Apart from plugin implementation in Vimscript (VimL), we can also use the Lua programming language. In addition, NeoVim has a lot more powerful plugins, which are not compatible with Vim.

Is Vim faster than NVIM?

Neovim is Noticeably FASTER Than Vim As mentioned earlier, one of the major reasons to fork Vim was it's archaic source code. Since Vim has been in existence for close to three decades it's source code is pretty bloated by now.


2 Answers

The 'a little bit faster' comment is accurate in that there is a little less bookkeeping to be done, but the emphasis is on 'little bit' rather than 'faster'. Basically, normally, the material matched by \(pattern\) has to be kept so that you can use \3 (for the appropriate number) to refer to it in the replacement. The % notation means that vim does not have to keep track of the match - so it is doing a little less work.


@SimpleQuestions asks:

What do you mean by "keep track of the match"? How does it affect speed?

You can use escaped parentheses to 'capture' parts of the matched pattern. For example, suppose we're playing with simple C function declarations - no pointers to functions or other sources of parentheses - then we might have a substitute command such as the following:

s@\<\([a-zA-Z_][a-zA-Z_0-9]*\)(\([^)]*\))@xyz_\1(int nargs) /* \2 */@

Given an input line such as:

int simple_function(int a, char *b, double c)

The output will be:

int xyz_simple_function(int nargs) /* int a, char *b, double c */

(Why might you want to do that? I'm imagining that I need to wrap the C function simple_function so that it can be called from a language compiled to C that uses a different interface convention - it is based on Informix 4GL, to be precise. I'm using it to get an example - not because you really need to know why it was a good change to make.)

Now, in the example, the \1 and \2 in the replacement text refer to the captured parts of the regular expression - the function name (a sequence of alphanumerics starting with an alphabetic character - counting underscore as 'alphabetic') and the function argument list (everything between the parentheses, but not including the parentheses).

If I'd used the \%(....\) notation around the function identifier, then \1 would refer to the argument list and there would be no \2. Because vim would not have to keep track of one of the two captured parts of the regular expression, it has marginally less bookkeeping to do than if it had to keep track of two captured parts. But, as I said, the difference is tiny; you could probably never measure it in practice. That's why the manual says 'it allows more groups'; if you needed to group parts of your regular expression but didn't need to refer to them again, then you could work with longer regular expressions. However, by the time you have more than 9 remembered (captured) parts to the regular expression, your brain is usually doing gyrations and your fingers will make mistakes anyway - so the effort is not usually worth it. But that is, I think, the argument for using the \%(...\) notation. It matches the Perl (PCRE) notation '(?:...)' for a non-capturing regular expression.

like image 116
Jonathan Leffler Avatar answered Sep 21 '22 18:09

Jonathan Leffler


I asked in #Vim, whether the other is faster because of backtracking. The user godlygeek answered:

No, it's faster because the thing that's matched doesn't need to be strdup'ed -- any unnecessary work is a bad thing for a syntax file.

He continued:

[The speed] depends on how big the string is. For 3 characters, it doesn't matter much, for 3000 it probably does. And keep in mind that it needs to be strdup'ed every time it matches.... including during backtracking... which means that even the 3 characters could be strdup'ed 1000 times over the course of matching a single regex. -- the syntax files are in $VIMRUNTIME/syntax

like image 31
Léo Léopold Hertz 준영 Avatar answered Sep 22 '22 18:09

Léo Léopold Hertz 준영