consider I have a df
:
Product Category
Bill Payment for Torrent Power Limited
Recharge of Videocon d2h DTH
Bill Payment of Airtel Mobile
Recharge of Idea Mobile
Now if a string contains "Bill Payment" and "Mobile" both then i want to tag its category as "Postpaid" and if a string contains "Recharge" and "Mobile" i want to tag it as "Prepaid".
I am a beginner in R so an easiest way would be appreciated .
Result should be
Product Category
Bill Payment for Torrent Power Limited NA
Recharge of Videocon d2h DTH NA
Bill Payment of Airtel Mobile Postpaid
Recharge of Idea Mobile Prepaid
In R, we use the grepl() function to check if characters are present in a string or not. And the method returns a Boolean value, TRUE - if the specified sequence of characters are present in the string.
Method 2: Using str_detect() method str_detect() Function in R Language is used to check if the specified match of the substring exists in the original string. It will return TRUE for a match found otherwise FALSE against each of the elements of the Vector or matrix.
In R, there is a method called contains().
We can use grep
to find the index of 'Product' with both 'Bill Payment/Mobile' ('i1') or 'Recharge/Mobile' ('i2'). After initializing the 'Category' as NA, we replace the elements based on the index i1 and i2.
i1 <- grepl('Bill Payment', df1$Product) & grepl('Mobile', df1$Product)
i2 <- grepl('Recharge', df1$Product) & grepl('Mobile', df1$Product)
df1$Category <- NA
df1$Category[i1] <- 'Postpaid'
df1$Category[i2] <- 'Prepaid'
df1
#[1] NA NA "Postpaid" "Prepaid"
Or a slightly more compact (that works with the example) option is
i1 <- grepl('.*Bill Payment.*Mobile.*', df1$Product)
i2 <- grepl('.*Recharge.*Mobile.*', df1$Product)
and do with ifelse
A different approach is creating a numerical index first and then adding the respective values:
indx <- (grepl('Bill Payment', df1$Product) & grepl('Mobile', df1$Product)) +
(grepl('Recharge', df1$Product) & grepl('Mobile', df1$Product))*2 + 1L
df1$category <- c(NA, "Postpaid", "Prepaid")[indx]
which gives:
> df1
Product category
1 Bill Payment for Torrent Power Limited <NA>
2 Recharge of Videocon d2h DTH <NA>
3 Bill Payment of Airtel Mobile Postpaid
4 Recharge of Idea Mobile Prepaid
You can also create this index using the more compact notation as proposed by @akrun:
indx <- grepl('.*Bill Payment.*Mobile.*', df1$Product) +
grepl('.*Recharge.*Mobile.*', df1$Product)*2 + 1L
Or like @nicola proposed:
tmp <- grepl('Mobile', df1$Product)
indx <- (grepl('Bill Payment', df1$Product) & tmp) + (grepl('Recharge', df1$Product) & tmp)*2 + 1L
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With