Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

unique words by group

this is my example dataframe

example = data.frame(group = c("A", "B", "A", "A"), word = c("car", "sun ,sun, house", "car, house", "tree"))

I would like to get only unique words within group and through groups

So I would like to get this

group   word
A       car, tree
B       sun

I used aggregate and get this

aggregate(word ~ group , data = example,  FUN = paste0) 

  group                  word
1     A car, car, house, tree
2     B       sun ,sun, house

but now i need to select only unique values, but even this does not work out

for (i in 1:nrow(cluster)) {cluster[i, ][["word"]] = lapply(unlist(cluster[i, ][["word"]]), unique)}

with

Error in `[[<-.data.frame`(`*tmp*`, "word", value = list("car", "car, house",  : 
  replacement has 3 rows, data has 1
like image 714
onhalu Avatar asked Apr 07 '26 06:04

onhalu


2 Answers

A base R option using aggregate + subset + ave like below

with(
  aggregate(
    word ~ .,
    example,
    function(x) {
      unlist(strsplit(x, "[, ]+"))
    }
  ),
  aggregate(
    . ~ ind,
    subset(
      unique(stack(setNames(word, group))),
      ave(seq_along(ind), values, FUN = length) == 1
    ),
    c
  )
)

gives

  ind    values
1   A car, tree
2   B       sun
like image 65
ThomasIsCoding Avatar answered Apr 09 '26 22:04

ThomasIsCoding


Here's a dplyr solution:

library(dplyr)
library(tidyr)
example %>% 
  separate_rows(word) %>% 
  distinct(group, word) %>% 
  group_by(word) %>% 
  filter(n() == 1) %>% 
  group_by(group) %>% 
  summarise(word = toString(word))

output

  group word       
1 A     car, tree
2 B     sun      
like image 40
Maël Avatar answered Apr 10 '26 00:04

Maël