Should be simple enough but it's become a difficult issue to solve. I have data that are grouped by their trailing decimals (a product of an upstream data source). For example, the data can be grouped for group "3" as 0.00003 while the data for group "10" is 24.00010. However, when I run both my regexpr code and my str_sub code it's as if R doesn't treat the last 0 as important.
Example Data
df <- data.frame(a = c(0.00003, 0.00010, 24.00003, 24.00010))
print(df)
a
1 0.00003
2 0.00010
3 24.00003
4 24.00010
Desired Output
a group
1 0.00003 group03
2 0.00010 group10
3 24.00003 group03
4 24.00010 group10
Failed Attempt 1
df %>% mutate(group = paste0("group", regmatches(a, regexpr("(\\d{2}$)", a))))
a group
1 0.00003 group03
2 0.00010 group01
3 24.00003 group03
4 24.00010 group01
This failure is peculiar as this works when I check it on: https://regexr.com/, using (\d{2}$)
Failed Attempt 2
df %>% mutate(group = paste0("group", str_sub(a, start = -2)))
a group
1 0.00003 group03
2 0.00010 group01
3 24.00003 group03
4 24.00010 group01
The key here is that when you substring or extract with regex, you are converting the number into a string. The string, however does not keep the format you are expecting.
library(tidyverse)
tibble(a = c(0.00003, 0.00010, 24.00003, 24.00010)) %>%
mutate(group1 = paste0("group", str_extract(sprintf("%.5f", a), "\\d{2}$")),
group2 = paste0("group", str_extract(a, "\\d{2}$")),
sprint_char = sprintf("%.5f", a),
char = as.character(a))
#> # A tibble: 4 x 5
#> a group1 group2 sprint_char char
#> <dbl> <chr> <chr> <chr> <chr>
#> 1 0.00003 group03 group05 0.00003 3e-05
#> 2 0.0001 group10 group04 0.00010 1e-04
#> 3 24.0 group03 group03 24.00003 24.00003
#> 4 24.0 group10 group01 24.00010 24.0001
See here that as.character(a) does not maintain the same structure as a. You can instead set the formatting with sprintf, and then extract the text that you want.
We can convert to character and use str_sub. Also, make sure the options are set
options(scipen = 999)
library(stringr)
library(dplyr)
df %>%
mutate(group = paste0("group", str_sub(sprintf("%2.5f", a), start = -2)))
# a group
#1 0.00003 group03
#2 0.00010 group10
#3 24.00003 group03
#4 24.00010 group10
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With