Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to count instances of duplicate characters within a string?

I have a dataframe:

levels     counts
1, 2, 2        24
1, 2           20
1, 3, 3, 3     15
1, 3           10
1, 2, 3        25

I want to treat, for example, "1, 2, 2" and "1, 2" as the same thing. So, as long as there is a "1" and "2" without any other character, it will count as the level "1, 2". Here is the desired data frame:

levels     counts
  1, 2         44
  1, 3         25
  1, 2, 3      25

Here is code to reproduce the original data frame:

df <- data.frame(levels = c("1, 2, 2", "1, 2", "1, 3, 3, 3", "1, 3", "1, 2, 3"), 
                 counts = c(24, 20, 15, 10, 25))
df$levels <- as.character(df$levels)
like image 410
JRP Avatar asked Jul 31 '17 15:07

JRP


1 Answers

Split df$levels, get the unique elements, and then sort it. Then use that to obtain aggregate of counts.

df$levels2 = sapply(strsplit(df$levels, ", "), function(x)
    paste(sort(unique(x)), collapse = ", "))   #Or toString(sort(unique(x))))
aggregate(counts~levels2, df, sum)
#  levels2 counts
#1    1, 2     44
#2 1, 2, 3     25
#3    1, 3     25
like image 155
d.b Avatar answered Jan 14 '23 18:01

d.b