Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using str_extract in R to extract a number before a substring with regex

I would like to use str_extract in the stringr package to extract the numbers from strings in the form XX nights etcetc.

I'm currently doing this:

library(stringr)

str_extract("17 nights$5 Days", "(\\d)+ nights")

but that returns

"17 nights"

instead of 17.

How can I extract just the number? I thought specifying the extract group with parentheses would work, but it doesn't.

like image 660
Harry M Avatar asked Aug 10 '19 00:08

Harry M


3 Answers

You can use the look ahead regular express (?=)

library(stringr)

str_extract("17 nights$5 Days", "(\\d)+(?= nights)")

(\d) - a digit
(\d)+ - one or more digits
(?= nights) - that comes in front of " nights"

The look behind (?<=) can also come in handy.

A good reference cheatsheet is from Rstudio's website: https://raw.githubusercontent.com/rstudio/cheatsheets/main/regex.pdf

like image 97
Dave2e Avatar answered Oct 06 '22 01:10

Dave2e


If you want to specify a specific group for return, use str_replace(). The pattern you want to capture is wrapped in (), then in the replacement argument you refer to that group as "\\1" as it is capture group number one.

I added the ^ to indicate you want numbers only at the beginning of the string.


library(stringer)

str_replace(string = "17 nights$5 Days",
            pattern = "(^\\d+).*",
            replacement = "\\1")

giving:

[1] "17"

like image 36
Jeremy Allen Avatar answered Oct 06 '22 01:10

Jeremy Allen


In base R, we can use sub to extract number which comes before "nights"

as.integer(sub("(\\d+)\\s+nights.*", "\\1","17 nights$5 Days"))
#[1] 17

Or if the number is always the first number in the string we can use readr::parse_number

readr::parse_number("17 nights$5 Days")
#[1] 17
like image 28
Ronak Shah Avatar answered Oct 06 '22 02:10

Ronak Shah