I have a data.frame
with ids composed of sequences of alphanumeric characters (e.g., id = c(A001, A002, B013)
). I was looking for an easy function under stringr
or stirngi
that would easily do math with this strings (id + 1 should return c(A002, A003, B014)
).
I made a custom function that does the trick, however I have a feeling that there must be a better/more efficient/within package way to achieve this.
str_add_n <- function(df, string, n, width=3){
string <- enquo(string)
## split the string using pattern
df <- df %>%
separate(!!string,
into = c("text", "num"),
sep = "(?<=[A-Za-z])(?=[0-9])",
remove=FALSE
) %>%
mutate(num = as.numeric(num),
num = num + n,
num = stringr::str_pad(as.character(num),
width = width,
side = "left",
pad = 0
)
) %>%
unite(next_string, text:num, sep = "")
return(df)
}
Let's make a toy df
df <- data.frame(id = c("A001", "A002", "B013"))
str_add_n(df, id, 1)
id next_string
1 A001 A002
2 A002 A003
3 B013 B014
Again, this works, I'm wondering if there's a better way to do this, all tweaks welcome!
Based on the suggested answers I ran some benchmarking and it appears that both come very close, I would be inclined for the str_add_n_2
(I changed the name to be able to run both, and took the suggestion of x<-as.character(x)
)
microbenchmark::microbenchmark(question = str_add_n(df, id, 1),
answer = df %>% mutate_at(vars(id), funs(str_add_n_2(., 1))),
string_add = df %>% mutate_at(vars(id), funs(string_add(as.character(.)))))
Which yields
Unit: milliseconds
expr min lq mean median uq
question 4.312094 4.448391 4.695276 4.570860 4.755748
answer 2.932146 3.017874 3.191262 3.117627 3.240688
string_add 3.388442 3.466466 3.699363 3.534416 3.682762
max neval cld
10.29253 100 c
8.24967 100 a
9.05441 100 b
More tweaks are welcome!
Here is a way with gsubfn
id <- c("A001", "A002", "B013")
library(gsubfn)
gsubfn("([0-9]+)", function(x) sprintf("%03.0f", as.numeric(x) + 1), id)
#[1] "A002" "A003" "B014"
You could make it a function
string_add <- function(string, add = 1, width = 3) {
gsubfn::gsubfn("([0-9]+)", function(x) sprintf(paste0("%0", width, ".0f"), as.numeric(x) + add), string)
}
string_add(id, add = 10, width = 5)
#"A00011" "A00012" "B00023"
I'd suggest it's easier to define the function based on a vector of strings and not hard-code it to looking for columns in the frame; for the latter, you can always use something like mutate_at(vars(id,...), funs(str_add_n))
.
str_add_n <- function(x, n = 1L) {
gr <- gregexpr("\\d+", x)
reg <- regmatches(x, gr)
widths <- nchar(reg)
regmatches(x, gr) <- sprintf(paste0("%0", widths, "d"), as.integer(reg) + n)
x
}
vec <- c("A001", "A002", "B013")
str_add_n(vec)
# [1] "A002" "A003" "B014"
If in a frame:
df <- data.frame(id = c("A001", "A002", "B013"), x = 1:3,
stringsAsFactors = FALSE)
library(dplyr)
df %>%
mutate_at(vars(id), funs(str_add_n(., 3)))
# id x
# 1 A004 1
# 2 A005 2
# 3 B016 3
Caveat: this silently requires true character
, not factor
... a possible defensive tactic might be to add x <- as.character(x)
in the function definition.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With