Jeffrey Friedl lists 3 main types of regex engines in his book "Mastering Regular Expressions": <ul> <li>Traditional NFA</li> <li>POSIX NFA</li> <li>DFA (POSIX or not)</li> </ul> Which of these does R use as a standard?

The <code>?regex</code> page cites the TRE documentation. Near the top of the <code>grep.c</code> source we see: <pre class="prettyprint"><code>/* As from TRE 0.8.0, tre.h replaces regex.h */ #include <tre/tre.h> </code></pre> And copying my earlier comment: http://swtch.com/~rsc/regexp says TRE uses NFA. Then PCRE is used for <code>perl=TRUE</code>.

Which regular expression engine type does R use as a standard?

2 Answers

The ?regex page cites the TRE documentation. Near the top of the grep.c source we see:

/* As from TRE 0.8.0, tre.h replaces regex.h */
#include <tre/tre.h>

And copying my earlier comment: http://swtch.com/~rsc/regexp says TRE uses NFA. Then PCRE is used for perl=TRUE.

137

answered Oct 04 '22 03:10

IRTFM

My understanding (But I have not found this in official documents) is that the R regex functions by default use the tcl regex library which is a hybrid of DFA and NFA.

The engine will first scan the regexp for any non-DFA compatible pieces and extract parts that are DFA (so strips out back references and other things that are only available in NFA). It then tries to find a match to this (possibly) simplified pattern using a DFA engine. If it cannot find a match then the full regex will not match and it returns with a failure. If it finds a match then it goes back and matches the full regex using an NFA engine (I think traditional/non-posix), but starting at the location where the simplified match occurred. This is much faster (for both non-matches and matches) than a straight NFA engine, but still lets you use all the things in an NFA that a DFA does not support.

If you specify perl=TRUE in any function then it switches to the pcre library which is most like a traditional NFA (though I understand that it is not F, A, or traditional).

answered Oct 04 '22 02:10

Greg Snow

Related questions
                            
                                Match all GET url using angular-mocks for backendless
                            
                                How to truncate a string after a word in scala
                            
                                Bible Verse Regex
                            
                                Is regular expression search guaranteed to return first match?
                            
                                How can I escape all escape-worthy characters in one line of code?
                            
                                Pattern replace in R
                            
                                How to make regex match non-greedy?
                            
                                Regex for two digit number
                            
                                Regex Split Around Curly Braces
                            
                                Python regex for float or int while not splitting the float into two floats
                            
                                How to join lines adding a separator?
                            
                                Removing lines from a file that don't match a pattern using sed
                            
                                Javascript regex - specific number of characters in unordered string
                            
                                Creating a regular expression in Delphi using TRegEx
                            
                                Regex for currency number, How can I write it shorter?
                            
                                Python regex explanation needed - $ character usage
                            
                                Regex not working with Stream filter()
                            
                                New Mercosul License Plates Regex
                            
                                Regular expression to replace all script src attributes
                            
                                Perl \R regex strip Windows newline character

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Which regular expression engine type does R use as a standard?

Tags:

regex

r

histelheim

People also ask

2 Answers

IRTFM

Greg Snow

Recent Activity

Donate For Us