I would like to use str_extract in the stringr package to extract the numbers from strings in the form XX nights etcetc
.
I'm currently doing this:
library(stringr)
str_extract("17 nights$5 Days", "(\\d)+ nights")
but that returns
"17 nights"
instead of 17
.
How can I extract just the number? I thought specifying the extract group with parentheses would work, but it doesn't.
You can use the look ahead regular express (?=)
library(stringr)
str_extract("17 nights$5 Days", "(\\d)+(?= nights)")
(\d) - a digit
(\d)+ - one or more digits
(?= nights) - that comes in front of " nights"
The look behind (?<=)
can also come in handy.
A good reference cheatsheet is from Rstudio's website: https://raw.githubusercontent.com/rstudio/cheatsheets/main/regex.pdf
If you want to specify a specific group for return, use str_replace(). The pattern you want to capture is wrapped in (), then in the replacement argument you refer to that group as "\\1" as it is capture group number one.
I added the ^ to indicate you want numbers only at the beginning of the string.
library(stringer)
str_replace(string = "17 nights$5 Days",
pattern = "(^\\d+).*",
replacement = "\\1")
giving:
[1] "17"
In base R, we can use sub
to extract number which comes before "nights"
as.integer(sub("(\\d+)\\s+nights.*", "\\1","17 nights$5 Days"))
#[1] 17
Or if the number is always the first number in the string we can use readr::parse_number
readr::parse_number("17 nights$5 Days")
#[1] 17
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With