I have a string in a variable which we call v1. This string states picture numbers and takes the form of "Pic 27 + 28". I want to extract the first number and store it in a new variable called item.
Some code that I've tried is:
item <- unique(na.omit(as.numeric(unlist(strsplit(unlist(v1),"[^0-9]+")))))
This worked fine, until I came upon a list that went:
[1,] "Pic 26 + 25"
[2,] "Pic 27 + 28"
[3,] "Pic 28 + 27"
[4,] "Pic 29 + 30"
[5,] "Pic 30 + 29"
[6,] "Pic 31 + 32"
At this point I get more numbers than I want, as it is also grabbing other unique numbers (the 25).
I've actually tried doing it with gsub, but got nothing to work. Help would be appreciated greatly!
In this method to extract numbers from character string vector, the user has to call the gsub() function which is one of the inbuilt function of R language, and pass the pattern for the first occurrence of the number in the given strings and the vector of the string as the parameter of this function and in return, this ...
To get the first n characters from a string, we can use the built-in substr() function in R. The substr() function takes 3 arguments, the first one is a string, the second is start position, third is end position. Note: The negative values count backward from the last character.
The substring function in R can be used either to extract parts of character strings, or to change the values of parts of character strings. substring of a vector or column in R can be extracted using substr() function. To extract the substring of the column in R we use functions like substr() and substring().
With str_extract
from stringr
:
library(stringr)
vec = c("Pic 26 + 25", "Pic 27 + 28", "Pic 28 + 27",
"Pic 29 + 30", "Pic 30 + 29", "Pic 31 + 32")
str_extract(v1, "[0-9]+")
# [1] "26" "27" "28" "29" "30" "31"
In the responses below we use this test data:
# test data
v1 <- c("Pic 26 + 25", "Pic 27 + 28", "Pic 28 + 27", "Pic 29 + 30",
"Pic 30 + 29", "Pic 31 + 32")
1) gsubfn
library(gsubfn)
strapply(v1, "(\\d+).*", as.numeric, simplify = c)
## [1] 26 27 28 29 30 31
2) sub This requires no packages but does involve a slightly longer regular expression:
as.numeric( sub("\\D*(\\d+).*", "\\1", v1) )
## [1] 26 27 28 29 30 31
3) read.table This involves no regular expressions or packages:
read.table(text = v1, fill = TRUE)[[2]]
## [1] 26 27 28 29 30 31
In this particular example the fill=TRUE
could be omitted but it might be needed if the components of v1
had a differing number of fields.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With