Extracting matched words from a string

Tags:

regex

r

I have a database structure - abbreviated version below

structure(list(sex1 = c("totalmaleglobal", "totalfemaleglobal", 
"totalglobal", "totalfemaleGSK", "totalfemaleglobal", 
"totalfemaleUN")), .Names = "sex1", row.names = c(NA, 6L),
class="data.frame")

I want to extract the words 'total', 'totalmale', 'totalfemale'

How do do this?

I tried regex with the following code

pattern1=c("total")
pattern2=c("totalmale")
pattern3=c("totalfemale")

daly$sex <- str_extract(daly$sex1,pattern1)
daly$sex <- str_extract(daly$sex1,pattern2)
daly$sex <- str_extract(daly$sex1,pattern3)

But its giving me NA.

415

asked Sep 09 '16 07:09

user3919790

2 Answers

Maybe

library(stringr)
daly$sex <- str_extract(daly$sex1,paste(rev(mget(ls(pattern = "pattern\\d+"))), collapse="|"))
daly
#                sex1         sex
# 1   totalmaleglobal   totalmale
# 2 totalfemaleglobal totalfemale
# 3       totalglobal       total
# 4    totalfemaleGSK totalfemale
# 5 totalfemaleglobal totalfemale
# 6     totalfemaleUN totalfemale

answered Oct 12 '22 13:10

lukeA

Two steps with gsub,

v2 <- gsub(paste(v1, collapse='|'), '', d1$sex1)

gsub(paste(v2, collapse='|'), '', d1$sex1)
#[1] "totalmale"   "totalfemale" "total"       "totalfemale" "totalfemale" "totalfemale"

where

v1 <- c('total', 'totalmale', 'totalfemale')

answered Oct 12 '22 12:10

Sotos

Related questions
                            
                                BeautifulSoup returns empty list when searching by compound class names
                            
                                Find smallest substring containing a given set of letters in a larger string
                            
                                Contents of a RegExp match
                            
                                Extract YouTube ID with or without RegEx
                            
                                Find word not surrounded by alpha char
                            
                                Strange behavior of capturing group in regular expression
                            
                                Replacing punctuation except intra-word dashes with a space
                            
                                Regex matching only prefix or only suffix (XOR)
                            
                                What are the precedence rules for Perl regular expressions?
                            
                                How to use the {n} syntax of regex with CMake
                            
                                How can one replace a part of a line with sed?
                            
                                PHP preg_match not working for new line [duplicate]
                            
                                Regex: match permutations of DNA sequence
                            
                                R - invert gsub: keep only matches with gsub argument [duplicate]
                            
                                Remove characters preceding first instance of a capital letter in string in R
                            
                                Valid RegEx according to online test tools, not getting any matches when reading file in browser
                            
                                Matching at least one lower case letter AND at least one upper case letter
                            
                                std::regex escape backslashes in file path
                            
                                Regex for string containing one string, but not another [duplicate]
                            
                                How to find replace using regex groups in vim

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With