I am using gsub in R to add text into the middle of a string. It works perfectly but for some reason, when the location gets too long it throws an error. The code is below:
gsub(paste0('^(.{', as.integer(loc[1])-1, '})(.+)$'), new_cols, sql)
Error in gsub(paste0("^(.{273})(.+)$"), new_cols, sql) : invalid regular expression '^(.{273})(.+)$', reason 'Invalid contents of {}'
This code works fine when the number in the brackets(273 in this case) is less but not when it is this large.
This produces the error:
sql <- "The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats.The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats."
new_cols <- "happy"
gsub('^(.{125})(.+)$', new_cols, sql) #**Works
gsub('^(.{273})(.+)$', new_cols, sql)
Error in gsub("^(.{273})(.+)$", new_cols, sql) : invalid regular expression '^(.{273})(.+)$', reason 'Invalid contents of {}'
Regular expressions (shortened to regex) are used to operate on patterns found in strings. They can find, replace, or remove certain parts of strings depending on what you tell them to do.
gsub stands for global substitution (replace everywhere). It replaces every occurrence of a regular expression (original string) with the replacement string in the given string.
The gsub() function in R is used for replacement operations. The functions takes the input and substitutes it against the specified values. The gsub() function always deals with regular expressions. You can use the regular expressions as the parameter of substitution.
#gsub is not only slower, but it also requires an extra effort for the reader to 'decode' the arguments.
R gsub
uses TRE regex library by default. The boundaries in the limiting quantifier are valid from 0 till RE_DUP_MAX
that is defined in the TRE code. See this TRE reference:
A bound is one of the following, where
n
andm
are unsigned decimal integers between0
andRE_DUP_MAX
It seems that the RE_DUP_MAX
is set to 255 (see this TRE source file showing #define RE_DUP_MAX 255
), and thus, you cannot use more in {n,m}
limiting quantifier.
Use PCRE regex flavor, add perl = TRUE
and it will work.
R demo:
> sql <- "The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats.The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats."
> new_cols <- "happy"
> gsub('^(.{273})(.+)$', new_cols, sql, perl=TRUE)
[1] "happy"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With