Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R extract first number from string

I have a string in a variable which we call v1. This string states picture numbers and takes the form of "Pic 27 + 28". I want to extract the first number and store it in a new variable called item.

Some code that I've tried is:

item <- unique(na.omit(as.numeric(unlist(strsplit(unlist(v1),"[^0-9]+")))))

This worked fine, until I came upon a list that went:

[1,] "Pic 26 + 25"
[2,] "Pic 27 + 28"
[3,] "Pic 28 + 27"
[4,] "Pic 29 + 30"
[5,] "Pic 30 + 29"
[6,] "Pic 31 + 32"

At this point I get more numbers than I want, as it is also grabbing other unique numbers (the 25).

I've actually tried doing it with gsub, but got nothing to work. Help would be appreciated greatly!

like image 544
kneijenhuijs Avatar asked Apr 27 '14 12:04

kneijenhuijs


People also ask

How do I extract the first number from a string?

In this method to extract numbers from character string vector, the user has to call the gsub() function which is one of the inbuilt function of R language, and pass the pattern for the first occurrence of the number in the given strings and the vector of the string as the parameter of this function and in return, this ...

How do I get the first 4 characters of a string in R?

To get the first n characters from a string, we can use the built-in substr() function in R. The substr() function takes 3 arguments, the first one is a string, the second is start position, third is end position. Note: The negative values count backward from the last character.

How do I get part of a string in R?

The substring function in R can be used either to extract parts of character strings, or to change the values of parts of character strings. substring of a vector or column in R can be extracted using substr() function. To extract the substring of the column in R we use functions like substr() and substring().


2 Answers

With str_extract from stringr:

library(stringr)

vec = c("Pic 26 + 25", "Pic 27 + 28", "Pic 28 + 27", 
        "Pic 29 + 30", "Pic 30 + 29", "Pic 31 + 32")

str_extract(v1, "[0-9]+")
# [1] "26" "27" "28" "29" "30" "31"
like image 190
acylam Avatar answered Sep 22 '22 08:09

acylam


In the responses below we use this test data:

# test data
v1 <- c("Pic 26 + 25", "Pic 27 + 28", "Pic 28 + 27", "Pic 29 + 30", 
"Pic 30 + 29", "Pic 31 + 32")

1) gsubfn

library(gsubfn)

strapply(v1, "(\\d+).*", as.numeric, simplify = c)
## [1] 26 27 28 29 30 31

2) sub This requires no packages but does involve a slightly longer regular expression:

as.numeric( sub("\\D*(\\d+).*", "\\1", v1) )
## [1] 26 27 28 29 30 31

3) read.table This involves no regular expressions or packages:

read.table(text = v1, fill = TRUE)[[2]]
## [1] 26 27 28 29 30 31

In this particular example the fill=TRUE could be omitted but it might be needed if the components of v1 had a differing number of fields.

like image 32
G. Grothendieck Avatar answered Sep 24 '22 08:09

G. Grothendieck