I have a database structure - abbreviated version below
structure(list(sex1 = c("totalmaleglobal", "totalfemaleglobal",
"totalglobal", "totalfemaleGSK", "totalfemaleglobal",
"totalfemaleUN")), .Names = "sex1", row.names = c(NA, 6L),
class="data.frame")
I want to extract the words 'total', 'totalmale', 'totalfemale'
How do do this?
I tried regex with the following code
pattern1=c("total")
pattern2=c("totalmale")
pattern3=c("totalfemale")
daly$sex <- str_extract(daly$sex1,pattern1)
daly$sex <- str_extract(daly$sex1,pattern2)
daly$sex <- str_extract(daly$sex1,pattern3)
But its giving me NA.
Extract a specific word from a string using find() method. If we want to extract a specific word from the string and we do not know the exact position of the word, we can first find the position of the word using find() method and then we can extract the word using string slicing.
Method #1 : Using split() Using the split function, we can split the string into a list of words and this is the most generic and recommended method if one wished to accomplish this particular task. But the drawback is that it fails in cases the string contains punctuation marks.
Use re.search() to extract a substring matching a regular expression pattern. Specify the regular expression pattern as the first parameter and the target string as the second parameter. \d matches a digit character, and + matches one or more repetitions of the preceding pattern.
The substr() method extracts a part of a string. The substr() method begins at a specified position, and returns a specified number of characters. The substr() method does not change the original string. To extract characters from the end of the string, use a negative start position.
Maybe
library(stringr)
daly$sex <- str_extract(daly$sex1,paste(rev(mget(ls(pattern = "pattern\\d+"))), collapse="|"))
daly
# sex1 sex
# 1 totalmaleglobal totalmale
# 2 totalfemaleglobal totalfemale
# 3 totalglobal total
# 4 totalfemaleGSK totalfemale
# 5 totalfemaleglobal totalfemale
# 6 totalfemaleUN totalfemale
Two steps with gsub
,
v2 <- gsub(paste(v1, collapse='|'), '', d1$sex1)
gsub(paste(v2, collapse='|'), '', d1$sex1)
#[1] "totalmale" "totalfemale" "total" "totalfemale" "totalfemale" "totalfemale"
where
v1 <- c('total', 'totalmale', 'totalfemale')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With