Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find all unique values in column separated by comma

Tags:

r

strsplit

I have multiple observations of one species with different observers / groups of observers and want to create a list of all unique observers. My data look like this:

data <- read.table(text="species observer
1 A,B
1 A,B
1 B,E
1 B,E
1 D,E,A,C,C
1 F"               , header = TRUE, stringsAsFactors = FALSE)

My output should return a list of all unique observers - so:

A,B,C,E,F

I tried to substring the data in column C using the following command but that only returns the unique combinations of observers.

all_observers <- unique(strsplit(as.character(data$observer), ","))

all_observers
[[1]]
[1] "A" "B"

[[2]]
[1] "B" "E"

[[3]]
[1] "D" "E" "A" "C" "C"

[[4]]
[1] "F"
like image 571
Kanoet Avatar asked Oct 27 '25 06:10

Kanoet


2 Answers

You're almost there, you just need to unlist before you do the unique:

all_observers <- unique(unlist(strsplit(as.character(data$observer), ",")))
like image 101
Gregor Thomas Avatar answered Oct 28 '25 21:10

Gregor Thomas


We can use separate_rows on the 'observer', get the distinct rows, grouped by 'species', and paste the 'observer'

library(tidyverse)
data %>% 
   separate_rows(observer) %>% 
   distinct %>% 
   group_by(species) %>% 
   summarise(observer = toString(observer))
like image 38
akrun Avatar answered Oct 28 '25 19:10

akrun



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!