Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

If a text string contains something then return something in R

Tags:

r

consider I have a df :

Product                                   Category   
Bill Payment for Torrent Power Limited    
Recharge of Videocon d2h DTH              
Bill Payment of Airtel Mobile
Recharge of Idea Mobile

Now if a string contains "Bill Payment" and "Mobile" both then i want to tag its category as "Postpaid" and if a string contains "Recharge" and "Mobile" i want to tag it as "Prepaid".

I am a beginner in R so an easiest way would be appreciated .

Result should be

Product                                   Category   
Bill Payment for Torrent Power Limited    NA
Recharge of Videocon d2h DTH              NA
Bill Payment of Airtel Mobile             Postpaid
Recharge of Idea Mobile                   Prepaid
like image 582
Pankaj Kaundal Avatar asked Jan 28 '16 11:01

Pankaj Kaundal


People also ask

How do you check if a string contains a value in R?

In R, we use the grepl() function to check if characters are present in a string or not. And the method returns a Boolean value, TRUE - if the specified sequence of characters are present in the string.

How do you check if a substring is present in a string in R?

Method 2: Using str_detect() method str_detect() Function in R Language is used to check if the specified match of the substring exists in the original string. It will return TRUE for a match found otherwise FALSE against each of the elements of the Vector or matrix.

How do you say contains in R?

In R, there is a method called contains().


2 Answers

We can use grep to find the index of 'Product' with both 'Bill Payment/Mobile' ('i1') or 'Recharge/Mobile' ('i2'). After initializing the 'Category' as NA, we replace the elements based on the index i1 and i2.

i1 <- grepl('Bill Payment', df1$Product) & grepl('Mobile', df1$Product)
i2 <- grepl('Recharge', df1$Product) & grepl('Mobile', df1$Product)
df1$Category <- NA
df1$Category[i1] <- 'Postpaid'
df1$Category[i2] <- 'Prepaid'
df1
#[1] NA         NA         "Postpaid" "Prepaid" 

Or a slightly more compact (that works with the example) option is

i1 <- grepl('.*Bill Payment.*Mobile.*', df1$Product)
i2 <- grepl('.*Recharge.*Mobile.*', df1$Product)

and do with ifelse

like image 171
akrun Avatar answered Oct 09 '22 06:10

akrun


A different approach is creating a numerical index first and then adding the respective values:

indx <- (grepl('Bill Payment', df1$Product) & grepl('Mobile', df1$Product)) + 
  (grepl('Recharge', df1$Product) & grepl('Mobile', df1$Product))*2 + 1L

df1$category <- c(NA, "Postpaid", "Prepaid")[indx]

which gives:

> df1
                                 Product category
1 Bill Payment for Torrent Power Limited     <NA>
2           Recharge of Videocon d2h DTH     <NA>
3          Bill Payment of Airtel Mobile Postpaid
4                Recharge of Idea Mobile  Prepaid

You can also create this index using the more compact notation as proposed by @akrun:

indx <- grepl('.*Bill Payment.*Mobile.*', df1$Product) + 
  grepl('.*Recharge.*Mobile.*', df1$Product)*2 + 1L

Or like @nicola proposed:

tmp <- grepl('Mobile', df1$Product)
indx <- (grepl('Bill Payment', df1$Product) & tmp) + (grepl('Recharge', df1$Product) & tmp)*2 + 1L
like image 26
Jaap Avatar answered Oct 09 '22 06:10

Jaap