What is the cleanest way of finding for example the string ": [1-9]*" and only keeping that part?
You can work with regexec to get the starting points, but isn't there a cleaner way just to get immediately the value?
For example:
test <- c("surface area: 458", "bedrooms: 1", "whatever")
regexec(": [1-9]*", test)
How do I get immediately just
c(": 458",": 1", NA )
The substr() method extracts a part of a string. The substr() method begins at a specified position, and returns a specified number of characters. The substr() method does not change the original string. To extract characters from the end of the string, use a negative start position.
You can extract a substring from a string before a specific character using the rpartition() method. rpartition() method partitions the given string based on the last occurrence of the delimiter and it generates tuples that contain three elements where.
Use the split() method to cut string before the character in Python. The split() method splits a string into a list.
You can use base R which handles this just fine.
> x <- c('surface area: 458', 'bedrooms: 1', 'whatever')
> r <- regmatches(x, gregexpr(':.*', x))
> unlist({r[sapply(r, length)==0] <- NA; r})
# [1] ": 458" ": 1" NA
Although, I find it much simpler to just do...
> x <- c('surface area: 458', 'bedrooms: 1', 'whatever')
> sapply(strsplit(x, '\\b(?=:)', perl=T), '[', 2)
# [1] ": 458" ": 1" NA
library(stringr)
str_extract(test, ":.*")
#[1] ": 458" ": 1" NA
Or for a faster approach stringi
library(stringi)
stri_extract_first_regex(test, ":.*")
#[1] ": 458" ": 1" NA
If you need the keep the values of the one that doesn't have the match
gsub(".*(:.*)", "\\1", test)
#[1] ": 458" ": 1" "whatever"
Try any of these. The first two use the base of R only. The last one assumes that we want to return a numeric vector.
1) sub
s <- sub(".*:", ":", test)
ifelse(test == s, NA, s)
## [1] ": 458" ": 1" NA
If there can be more than one : in a string then replace the pattern with "^[^:]*:"
.
2) strsplit
sapply(strsplit(test, ":"), function(x) c(paste0(":", x), NA)[2])
## [1] ": 458" ": 1" NA
Do not use this one if there can be more than one : in a string.
3) strapplyc
library(gsubfn)
s <- strapplyc(test, "(:.*)|$", simplify = TRUE)
ifelse(s == "", NA, s)
## [1] ": 458" ": 1" NA
We can omit the ifelse
line if ""
is ok instead of NA
.
4) strapply If the idea is really that there are some digits on the line and we want to return the numbers or NA then try this:
library(gsubfn)
strapply(test, "\\d+|$", as.numeric, simplify = TRUE)
## [1] 458 1 NA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With