I can not really find an elegant way achieving this, please help.
I have a DT
data.table:
name,value
"lorem pear ipsum",4
"apple ipsum lorem",2
"lorem ipsum plum",6
And based on a list Fruits <- c("pear", "apple", "plum")
I'd like to create a factor type column.
name,value,factor
"lorem pear ipsum",4,"pear"
"apple ipsum lorem",2,"apple"
"lorem ipsum plum",6,"plum"
I guess that's basic, but I'm kinda stuck, this is how far I got:
DT[grep("apple", name, ignore.case=TRUE), factor := as.factor("apple")]
Thanks in advance.
You can vectorize this with regular expressions, e.g. by using gsub()
:
Set up the data:
strings <- c("lorem pear ipsum", "apple ipsum lorem", "lorem ipsum plum")
fruit <- c("pear", "apple", "plum")
Now create a regular expression
ptn <- paste0(".*(", paste(fruit, collapse="|"), ").*")
gsub(ptn, "\\1", strings)
[1] "pear" "apple" "plum"
The regular expression works by separating each search element with |
, embedded inside parentheses:
ptn
[1] ".*(pear|apple|plum).*"
To do this inside a data table, as per your question is then as simple as:
library(data.table)
DT <- data.table(name=strings, value=c(4, 2, 6))
DT[, factor:=gsub(ptn, "\\1", strings)]
DT
name value factor
1: lorem pear ipsum 4 pear
2: apple ipsum lorem 2 apple
3: lorem ipsum plum 6 plum
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With