Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R summarize unique values across columns based on values from one column

I want to know the total number of unique values for each column based on the values of var_1.

For example:

Test <- data.frame(var_1 = c("a","a","a", "b", "b", "c", "c", "c", "c", "c"), var_2 = c("bl","bf","bl", "bl","bf","bl","bl","bf","bc", "bg" ), var_3 = c("cf","cf","eg", "cf","cf","eg","cf","dr","eg","fg"))

The results I am looking for would be based on the values in var_1 and should be:

var_1 var_2 var_3
a     2     2
b     2     1
c     3     4

However, after trying various methods (including apply and table) - aggregate has been the closest thing to what I am looking for, but this script results in a summary of the total number of entries for each value of var_1, but the total is not unique

agbyv1= aggregate(. ~ var_1, Test, length) 

var_1 var_2 var_3
a     3     3
b     2     2
c     5     5

I tried

unqbyv1= aggregate(. ~ var_1, Test, length(unique(x)))

but that didn't work.

Any help is greatly appreciated.

like image 972
Ina.Quest Avatar asked May 05 '15 18:05

Ina.Quest


People also ask

How do I extract unique values from multiple columns in R?

To extract unique values in multiple columns in an R data frame, we first need to create a vector of the column values but for that we would need to read the columns in matrix form. After that we can simply unique function for the extraction.

How do I extract unique values from a column in R?

To find unique values in a column in a data frame, use the unique() function in R. In Exploratory Data Analysis, the unique() function is crucial since it detects and eliminates duplicate values in the data.

What does distinct () mean in R?

distinct() is a function of dplyr package that is used to select distinct or unique rows from the R data frame.


1 Answers

Try

library(dplyr)
Test %>%
      group_by(var_1) %>% 
      summarise_each(funs(n_distinct(.)))

Or

library(data.table)#v1.9.5+
setDT(Test)[, lapply(.SD, uniqueN), var_1]

If there are NAs

setDT(Test)[, lapply(.SD, function(x) uniqueN(na.omit(x))), var_1]

Or you can use aggregate. By default, the na.action=na.omit. So, we don't need any modifications.

aggregate(.~ var_1, Test, FUN=function(x) length(unique(x)) )
like image 117
akrun Avatar answered Oct 05 '22 21:10

akrun