Extract a substring according to a pattern

People also ask

How do you extract a specific part of a string in R?

The substring function in R can be used either to extract parts of character strings, or to change the values of parts of character strings. substring of a vector or column in R can be extracted using substr() function. To extract the substring of the column in R we use functions like substr() and substring().

How do I extract a specific pattern from a string in Python?

Use re.search() to extract a substring matching a regular expression pattern. Specify the regular expression pattern as the first parameter and the target string as the second parameter. \d matches a digit character, and + matches one or more repetitions of the preceding pattern.

How can you extract a substring from a given string?

You can extract a substring from a string before a specific character using the rpartition() method. rpartition() method partitions the given string based on the last occurrence of the delimiter and it generates tuples that contain three elements where.

Here are a few ways:

1) sub

sub(".*:", "", string)
## [1] "E001" "E002" "E003"

2) strsplit

sapply(strsplit(string, ":"), "[", 2)
## [1] "E001" "E002" "E003"

3) read.table

read.table(text = string, sep = ":", as.is = TRUE)$V2
## [1] "E001" "E002" "E003"

4) substring

This assumes second portion always starts at 4th character (which is the case in the example in the question):

substring(string, 4)
## [1] "E001" "E002" "E003"

4a) substring/regex

If the colon were not always in a known position we could modify (4) by searching for it:

substring(string, regexpr(":", string) + 1)

5) strapplyc

strapplyc returns the parenthesized portion:

library(gsubfn)
strapplyc(string, ":(.*)", simplify = TRUE)
## [1] "E001" "E002" "E003"

6) read.dcf

This one only works if the substrings prior to the colon are unique (which they are in the example in the question). Also it requires that the separator be colon (which it is in the question). If a different separator were used then we could use sub to replace it with a colon first. For example, if the separator were _ then string <- sub("_", ":", string)

c(read.dcf(textConnection(string)))
## [1] "E001" "E002" "E003"

7) separate

7a) Using tidyr::separate we create a data frame with two columns, one for the part before the colon and one for after, and then extract the latter.

library(dplyr)
library(tidyr)
library(purrr)

DF <- data.frame(string)
DF %>% 
  separate(string, into = c("pre", "post")) %>% 
  pull("post")
## [1] "E001" "E002" "E003"

7b) Alternately separate can be used to just create the post column and then unlist and unname the resulting data frame:

library(dplyr)
library(tidyr)

DF %>% 
  separate(string, into = c(NA, "post")) %>% 
  unlist %>%
  unname
## [1] "E001" "E002" "E003"

8) trimws We can use trimws to trim word characters off the left and then use it again to trim the colon.

trimws(trimws(string, "left", "\\w"), "left", ":")
## [1] "E001" "E002" "E003"

Note

The input string is assumed to be:

string <- c("G1:E001", "G2:E002", "G3:E003")

For example using gsub or sub

    gsub('.*:(.*)','\\1',string)
    [1] "E001" "E002" "E003"

Here is another simple answer

gsub("^.*:","", string)

Late to the party, but for posterity, the stringr package (part of the popular "tidyverse" suite of packages) now provides functions with harmonised signatures for string handling:

string <- c("G1:E001", "G2:E002", "G3:E003")
# match string to keep
stringr::str_extract(string = string, pattern = "E[0-9]+")
# [1] "E001" "E002" "E003"

# replace leading string with ""
stringr::str_remove(string = string, pattern = "^.*:")
# [1] "E001" "E002" "E003"

This should do:

gsub("[A-Z][1-9]:", "", string)

gives

[1] "E001" "E002" "E003"

If you are using data.table then tstrsplit() is a natural choice:

tstrsplit(string, ":")[[2]]
[1] "E001" "E002" "E003"

The unglue package provides an alternative, no knowledge about regular expressions is required for simple cases, here we'd do :

# install.packages("unglue")
library(unglue)
string = c("G1:E001", "G2:E002", "G3:E003")
unglue_vec(string,"{x}:{y}", var = "y")
#> [1] "E001" "E002" "E003"

^{Created on 2019-11-06 by the reprex package (v0.3.0)}

More info : https://github.com/moodymudskipper/unglue/blob/master/README.md

Related questions
                            
                                javascript regex - look behind alternative?
                            
                                Convert PHP closing tag into comment
                            
                                Difference between \w and \b regular expression meta characters
                            
                                PHP code to remove everything but numbers
                            
                                Explode string by one or more spaces or tabs
                            
                                Why is "asdf".replace(/.*/g, "x") == "xx"?
                            
                                How do you debug a regex? [closed]
                            
                                Using regular expression in css?
                            
                                leading zeros in rails
                            
                                Visual Studio, Find and replace, regex
                            
                                Regex: Specify "space or start of string" and "space or end of string"
                            
                                Regular expression for letters, numbers and - _
                            
                                How to exclude a specific string constant? [duplicate]
                            
                                Java - escape string to prevent SQL injection
                            
                                Regular expression for floating point numbers
                            
                                RegEx - Match Numbers of Variable Length
                            
                                Negative lookbehind equivalent in JavaScript
                            
                                Trim spaces from start and end of string
                            
                                Regex using javascript to return just numbers
                            
                                JavaScript .replace only replaces first Match [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Extract a substring according to a pattern

Tags:

regex

r

substr

People also ask

Note

Recent Activity

Donate For Us