Remove duplicate words from cells in R

Question

I have a 2-column data frame, where the first column is a number, and the second column contains a list of research categories. A reduced version of my data:

aa <- data.frame(a=c(1:4),b=c("Fisheries, Fisheries, Geography, Marine Biology", 
"Fisheries", "Marine Biology, Marine Biology, Fisheries, Zoology", "Geography"))

I want to convert column b into a unique list of elements, i.e., remove the duplicates, so that the end result is

    a        b
    1        Fisheries, Geography, Marine Biology
    2        Fisheries
    3        Marine Biology, Fisheries, Zoology
    4        Geography

I am able to do this for individual elements of the list, for example, using unique(unlist(strsplit(aa[1]))) BUT only on individual elements, not the entire column (otherwise it returns a single unique list for the entire column). I can’t figure out how to do this for the entire list, one element at a time. Maybe with lapply and write my own function for *unique(unlist(strsplit()))?

Many thanks!

Matt Jewett · Accepted Answer

This should work for you.

aa <- data.frame(a=c(1:4),b=c("Fisheries, Fisheries, Geography, Marine Biology", 
                              "Fisheries", "Marine Biology, Marine Biology, Fisheries, Zoology", "Geography"))

aa$b <- sapply(aa$b, function(x) paste(unique(unlist(str_split(x,", "))), collapse = ", "))

Remove duplicate words from cells in R

Tags:

list

r

unique

Tessa Francis

1 Answers

Matt Jewett

Recent Activity

Donate For Us

Remove duplicate words from cells in R

Tags:

list

r

unique

Tessa Francis

1 Answers

Matt Jewett

Related questions

Recent Activity

Donate For Us