Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to look for a certain part in a string and only keep that part

Tags:

r

What is the cleanest way of finding for example the string ": [1-9]*" and only keeping that part?

You can work with regexec to get the starting points, but isn't there a cleaner way just to get immediately the value?

For example:

test <- c("surface area: 458", "bedrooms: 1", "whatever")
regexec(": [1-9]*", test)

How do I get immediately just

c(": 458",": 1", NA )
like image 951
Kasper Van Lombeek Avatar asked Aug 31 '14 17:08

Kasper Van Lombeek


People also ask

How do I get one part of a string?

The substr() method extracts a part of a string. The substr() method begins at a specified position, and returns a specified number of characters. The substr() method does not change the original string. To extract characters from the end of the string, use a negative start position.

How do you slice a string upto a certain character?

You can extract a substring from a string before a specific character using the rpartition() method. rpartition() method partitions the given string based on the last occurrence of the delimiter and it generates tuples that contain three elements where.

How do you print something before a character in python?

Use the split() method to cut string before the character in Python. The split() method splits a string into a list.


3 Answers

You can use base R which handles this just fine.

> x <- c('surface area: 458', 'bedrooms: 1', 'whatever')
> r <- regmatches(x, gregexpr(':.*', x))
> unlist({r[sapply(r, length)==0] <- NA; r})
# [1] ": 458" ": 1"   NA  

Although, I find it much simpler to just do...

> x <- c('surface area: 458', 'bedrooms: 1', 'whatever')
> sapply(strsplit(x, '\\b(?=:)', perl=T), '[', 2)
# [1] ": 458" ": 1"   NA 
like image 82
hwnd Avatar answered Oct 21 '22 05:10

hwnd


library(stringr)
str_extract(test, ":.*")
#[1] ": 458" ": 1"   NA     

Or for a faster approach stringi

library(stringi)
stri_extract_first_regex(test, ":.*")
#[1] ": 458" ": 1"   NA     

If you need the keep the values of the one that doesn't have the match

gsub(".*(:.*)", "\\1", test)
#[1] ": 458"    ": 1"      "whatever"
like image 26
akrun Avatar answered Oct 21 '22 05:10

akrun


Try any of these. The first two use the base of R only. The last one assumes that we want to return a numeric vector.

1) sub

s <- sub(".*:", ":", test)
ifelse(test == s, NA, s)
## [1] ": 458" ": 1"   NA   

If there can be more than one : in a string then replace the pattern with "^[^:]*:" .

2) strsplit

sapply(strsplit(test, ":"), function(x) c(paste0(":", x), NA)[2])
## [1] ": 458" ": 1"   NA

Do not use this one if there can be more than one : in a string.

3) strapplyc

library(gsubfn)
s <- strapplyc(test, "(:.*)|$", simplify = TRUE)
ifelse(s == "", NA, s)
## [1] ": 458" ": 1"   NA

We can omit the ifelse line if "" is ok instead of NA.

4) strapply If the idea is really that there are some digits on the line and we want to return the numbers or NA then try this:

library(gsubfn)
strapply(test, "\\d+|$", as.numeric, simplify = TRUE)
## [1] 458   1  NA
like image 22
G. Grothendieck Avatar answered Oct 21 '22 05:10

G. Grothendieck