I have a dataframe with 2 columns GL and GLDESC and want to add a 3rd column called KIND based on some data that is inside of column GLDESC.
The dataframe is as follows:
      GL                             GLDESC 1 515100         Payroll-Indir Salary Labor 2 515900 Payroll-Indir Compensated Absences 3 532300                           Bulk Gas 4 539991                     Area Charge In 5 551000        Repairs & Maint-Spare Parts 6 551100                 Supplies-Operating 7 551300                        Consumables   For each row of the data table:
GLDESC contains the word Payroll anywhere in the string then I want KIND to be Payroll GLDESC contains the word Gas anywhere in the string then I want KIND to be Materials KIND to be Other I looked for similar examples on stackoverflow but could not find any, also looked in R for dummies on switch, grep, apply and regular expressions to try and match only part of the GLDESC column and then fill the KIND column with the kind of account but was unable to make it work.
Since you have only two conditions, you can use a nested ifelse:
#random data; it wasn't easy to copy-paste yours   DF <- data.frame(GL = sample(10), GLDESC = paste(sample(letters, 10),    c("gas", "payroll12", "GaSer", "asdf", "qweaa", "PayROll-12",       "asdfg", "GAS--2", "fghfgh", "qweee"), sample(letters, 10), sep = " "))  DF$KIND <- ifelse(grepl("gas", DF$GLDESC, ignore.case = T), "Materials",           ifelse(grepl("payroll", DF$GLDESC, ignore.case = T), "Payroll", "Other"))  DF #   GL         GLDESC      KIND #1   8        e gas l Materials #2   1  c payroll12 y   Payroll #3  10      m GaSer v Materials #4   6       t asdf n     Other #5   2      w qweaa t     Other #6   4 r PayROll-12 q   Payroll #7   9      n asdfg a     Other #8   5     d GAS--2 w Materials #9   7     s fghfgh e     Other #10  3      g qweee k     Other   EDIT 10/3/2016 (..after receiving more attention than expected)
A possible solution to deal with more patterns could be to iterate over all patterns and, whenever there is match, progressively reduce the amount of comparisons:
ff = function(x, patterns, replacements = patterns, fill = NA, ...) {     stopifnot(length(patterns) == length(replacements))      ans = rep_len(as.character(fill), length(x))         empty = seq_along(x)      for(i in seq_along(patterns)) {         greps = grepl(patterns[[i]], x[empty], ...)         ans[empty[greps]] = replacements[[i]]           empty = empty[!greps]     }      return(ans) }  ff(DF$GLDESC, c("gas", "payroll"), c("Materials", "Payroll"), "Other", ignore.case = TRUE) # [1] "Materials" "Payroll"   "Materials" "Other"     "Other"     "Payroll"   "Other"     "Materials" "Other"     "Other"  ff(c("pat1a pat2", "pat1a pat1b", "pat3", "pat4"),     c("pat1a|pat1b", "pat2", "pat3"),     c("1", "2", "3"), fill = "empty") #[1] "1"     "1"     "3"     "empty"  ff(c("pat1a pat2", "pat1a pat1b", "pat3", "pat4"),     c("pat2", "pat1a|pat1b", "pat3"),     c("2", "1", "3"), fill = "empty") #[1] "2"     "1"     "3"     "empty" 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With