Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove duplicates in string

Tags:

r

I have the following data set

df <- data.frame(
    path = c("a,b,a", 
        "(direct) / (none),   (direct) / (none), google / cpc,    google / cpc", 
        "f,d", 
        "a,c"
    ) 
)

and I wish to remove the duplicated so that my output will be

                                                                       path
1:                                                                     a, b
2:                                       (direct) / (none),     google / cpc
3:                                                                     f, d
4:                                                                     a, c

I tried this but it does not work for the second row

setDT(df)

df$path <- sapply(strsplit(as.character(df$path ), split=","), function(x) {
    paste(unique(x), collapse = ', ')
})
like image 543
MFR Avatar asked Jun 30 '26 01:06

MFR


1 Answers

You were almost there. The only thing is that you need to split with ",\\s*" instead of just ",". In the latter case, calling unique won't produce the wanted output, since some string may differ for the number of blank spaces. If you remove them when you split, you solve this issue.

On another note, since you used setDT(df), I guess you are using data.table. If so, you need to use proper data.table grammar to avoid copies:

df[,path:=sapply(
   strsplit(as.character(df$path ), split=",\\s*"), 
    function(x) {paste(unique(x), collapse = ', ')})]

will modify the path column by reference.

like image 179
nicola Avatar answered Jul 01 '26 14:07

nicola