I have a simple dataframe:
df <- data.frame(test = c("test_A_1_1.txt", "test_A_2_1.txt", "test_A_3_1.txt"), value = c(0.51, 0.52, 0.56))
test value
1 test_A_1_1.txt 0.51
2 test_A_2_1.txt 0.52
3 test_A_3_1.txt 0.56
Expected output
I would like to copy the numbers on the end of the string in column 1 and placed it in column three or four respectively, like this:
test value new new
1 test_A_1.txt 0.51 1 1
2 test_A_2.txt 0.52 2 1
3 test_A_3.txt 0.56 3 1
Attempt
Using the following code, I am able to extract the numbers from the string:
library(stringr)
as.numeric(str_extract_all("test_A_3.txt", "[0-9]+")[[1]])[1] # Extracts the first number
as.numeric(str_extract_all("test_A_3.txt", "[0-9]+")[[1]])[2] # Extracts the second number
I would like to apply this code on all the values of the first column:
library(tidyverse)
df %>% mutate(new = as.numeric(str_extract_all(df$test, "[0-9]+")[[1]])[1])
However, this lead to a column new
, with only the number 1
.
What am I doing wrong?
We can use parse_number
from readr
library(dplyr)
library(purrr)
library(stringr)
df %>%
mutate(new = readr::parse_number(as.character(test)))
Regarding the OP's issue, it is selecting only the first list
element ([[1]]
) from the str_extract_all
(which returns a list
). Instead, it is better to use str_extract
as we need to extract only the first instance of one or more digits (\\d+
)
df %>%
mutate(new = as.numeric(str_extract(test, "[0-9]+")))
If we need to get the output from str_extract_all
(in case), unlist
the list
to a vector
and then apply the as.numeric
on that vector
df %>%
mutate(new = as.numeric(unlist(str_extract_all(test, "[0-9]+"))))
If there are multiple instances, then keep it as a list
after converting to numeric
by looping through the list
elements with map
df %>%
mutate(new = map(str_extract_all(test, "[0-9]+"), as.numeric))
NOTE: The str_extract
based solution was first posted here.
In base R
, we can use regexpr
df$new <- as.numeric(regmatches(df$test, regexpr("\\d+", df$test)))
With the updated example, if we need to get two instances of numbers, the first one can be extracted with str_extract
and the last (stri_extract_last
- from stringi
can be used as well), by providing a regex lookaround to check for digits followed by a .
and 'txt'
df %>%
mutate(new1 = as.numeric(str_extract(test, "\\d+")),
new2 = as.numeric(str_extract(test, "\\d+(?=\\.txt)")))
# test value new1 new2
#1 test_A_1_1.txt 0.51 1 1
#2 test_A_2_1.txt 0.52 2 1
#3 test_A_3_1.txt 0.56 3 1
Slightly modifying your existing code:
df %>%
mutate(new = as.integer(str_extract(test, "[0-9]+")))
Or simply
df$new <- as.integer(str_extract(df$test, "[0-9]+"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With