Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting matched words from a string

Tags:

regex

r

I have a database structure - abbreviated version below

structure(list(sex1 = c("totalmaleglobal", "totalfemaleglobal", 
"totalglobal", "totalfemaleGSK", "totalfemaleglobal", 
"totalfemaleUN")), .Names = "sex1", row.names = c(NA, 6L),
class="data.frame")

I want to extract the words 'total', 'totalmale', 'totalfemale'

How do do this?

I tried regex with the following code

pattern1=c("total")
pattern2=c("totalmale")
pattern3=c("totalfemale")

daly$sex <- str_extract(daly$sex1,pattern1)
daly$sex <- str_extract(daly$sex1,pattern2)
daly$sex <- str_extract(daly$sex1,pattern3)

But its giving me NA.

like image 415
user3919790 Avatar asked Sep 09 '16 07:09

user3919790


People also ask

How do I extract certain words from a string?

Extract a specific word from a string using find() method. If we want to extract a specific word from the string and we do not know the exact position of the word, we can first find the position of the word using find() method and then we can extract the word using string slicing.

How do you extract certain words from a string in Python?

Method #1 : Using split() Using the split function, we can split the string into a list of words and this is the most generic and recommended method if one wished to accomplish this particular task. But the drawback is that it fails in cases the string contains punctuation marks.

How do I extract a word from a string in python with regex?

Use re.search() to extract a substring matching a regular expression pattern. Specify the regular expression pattern as the first parameter and the target string as the second parameter. \d matches a digit character, and + matches one or more repetitions of the preceding pattern.

How do you extract part of a string?

The substr() method extracts a part of a string. The substr() method begins at a specified position, and returns a specified number of characters. The substr() method does not change the original string. To extract characters from the end of the string, use a negative start position.


2 Answers

Maybe

library(stringr)
daly$sex <- str_extract(daly$sex1,paste(rev(mget(ls(pattern = "pattern\\d+"))), collapse="|"))
daly
#                sex1         sex
# 1   totalmaleglobal   totalmale
# 2 totalfemaleglobal totalfemale
# 3       totalglobal       total
# 4    totalfemaleGSK totalfemale
# 5 totalfemaleglobal totalfemale
# 6     totalfemaleUN totalfemale
like image 59
lukeA Avatar answered Oct 12 '22 13:10

lukeA


Two steps with gsub,

v2 <- gsub(paste(v1, collapse='|'), '', d1$sex1)

gsub(paste(v2, collapse='|'), '', d1$sex1)
#[1] "totalmale"   "totalfemale" "total"       "totalfemale" "totalfemale" "totalfemale"

where

v1 <- c('total', 'totalmale', 'totalfemale')
like image 37
Sotos Avatar answered Oct 12 '22 12:10

Sotos