I am experiencing difficulty with the perl expression \\L\\1
in very particular circumstances on R-dev (2017-06-06 and 2017-06-16 r72796 builds):
bib <- readLines("https://raw.githubusercontent.com/HughParsonage/TeXCheckR/master/tests/testthat/lint_bib_in.bib", encoding = "UTF-8")
leading_spaces <- 2
is_field <- grepl("=", bib, fixed = TRUE)
field_width <- nchar(trimws(gsub("[=].*$", "", bib, perl = TRUE)))
widest_field <- max(field_width[is_field])
out <- bib
# Vectorized gsub:
for (line in seq_along(bib)){
# Replace every field line with
# two spaces + field name + spaces required for widest field + space
if (is_field[line]){
spaces_req <- widest_field - field_width[line]
out[line] <-
gsub("^\\s*(\\w+)\\s*[=]\\s*\\{",
paste0(paste0(rep(" ", leading_spaces), collapse = ""),
"\\L\\1",
paste0(rep(" ", spaces_req), collapse = ""),
" = {"),
bib[line],
perl = TRUE)
}
}
# Add commas:
out[is_field] <- gsub("\\}$", "\\},", out[is_field], perl = TRUE)
out[9]
#> R-dev " author"
#> R 3.4.0 " author = {Tony Wood and Amélie Hunter and Michael O'Toole and Prasana Venkataraman and Lucy Carter},"
To reproduce, it is necessary:
readLines
from a file, and specify the encoding. (Using dput
won't reproduce)\\L
or \\U
in the perl regex.Is this a change in R 3.5.0, or have I been misusing \\L
in this instance?
UPDATE
The patch correcting this behaviour was applied in r74274.
ORIGINAL ANSWER
There is clearly some unexpected behavior.
When referring to \1
, it works outputting:
[1] " author = {Tony Wood and Amélie Hunter and Michael O'Toole and Prasana Venkataraman and Lucy Carter},"
However, whenever a \U
or \L
is used with \1
,the second backreference gets removed.
"\\U\\1"
: [1] " AUTHOR"
"\\U\\1\\E\\2"
: [1] " AUTHOR"
A gsubfn
solution still works (here, an example with toupper()
):
library(gsubfn)
bib <- readLines("https://raw.githubusercontent.com/HughParsonage/TeXCheckR/master/tests/testthat/lint_bib_in.bib", encoding = "UTF-8")
leading_spaces <- 2
is_field <- grepl("=", bib, fixed = TRUE)
field_width <- nchar(trimws(gsub("[=].*$", "", bib, perl = TRUE)))
widest_field <- max(field_width[is_field])
out <- bib
# Vectorized gsub:
for (line in seq_along(bib)){
# Replace every field line with
# two spaces + field name + spaces required for widest field + space
if (is_field[line]){
spaces_req <- widest_field - field_width[line]
out[line] <-
gsubfn("^\\s*(\\w+)\\s*=\\s*\\{",
function(y) paste0(
paste0(rep(" ", leading_spaces), collapse = ""),
toupper(y),
paste0(rep(" ", spaces_req), collapse = ""),
" = {"
),
bib[line], engine="R"
)
}
}
# Add commas:
out[is_field] <- gsub("\\}$", "},", out[is_field], perl = TRUE)
out[9]
Output:
[1] " AUTHOR = {Tony Wood and Amélie Hunter and Michael O'Toole and Prasana Venkataraman and Lucy Carter},"
My sessionInfo details:
> sessionInfo()
R Under development (unstable) (2017-06-19 r72808)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] gsubfn_0.6-6 proto_1.0.0
loaded via a namespace (and not attached):
[1] compiler_3.5.0 tools_3.5.0 tcltk_3.5.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With