I want to know the total number of unique values for each column based on the values of var_1.
For example:
Test <- data.frame(var_1 = c("a","a","a", "b", "b", "c", "c", "c", "c", "c"), var_2 = c("bl","bf","bl", "bl","bf","bl","bl","bf","bc", "bg" ), var_3 = c("cf","cf","eg", "cf","cf","eg","cf","dr","eg","fg"))
The results I am looking for would be based on the values in var_1 and should be:
var_1 var_2 var_3
a 2 2
b 2 1
c 3 4
However, after trying various methods (including apply and table) - aggregate has been the closest thing to what I am looking for, but this script results in a summary of the total number of entries for each value of var_1, but the total is not unique
agbyv1= aggregate(. ~ var_1, Test, length)
var_1 var_2 var_3
a 3 3
b 2 2
c 5 5
I tried
unqbyv1= aggregate(. ~ var_1, Test, length(unique(x)))
but that didn't work.
Any help is greatly appreciated.
To extract unique values in multiple columns in an R data frame, we first need to create a vector of the column values but for that we would need to read the columns in matrix form. After that we can simply unique function for the extraction.
To find unique values in a column in a data frame, use the unique() function in R. In Exploratory Data Analysis, the unique() function is crucial since it detects and eliminates duplicate values in the data.
distinct() is a function of dplyr package that is used to select distinct or unique rows from the R data frame.
Try
library(dplyr)
Test %>%
group_by(var_1) %>%
summarise_each(funs(n_distinct(.)))
Or
library(data.table)#v1.9.5+
setDT(Test)[, lapply(.SD, uniqueN), var_1]
If there are NAs
setDT(Test)[, lapply(.SD, function(x) uniqueN(na.omit(x))), var_1]
Or you can use aggregate
. By default, the na.action=na.omit
. So, we don't need any modifications.
aggregate(.~ var_1, Test, FUN=function(x) length(unique(x)) )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With