I want to get the number of unique values in each of the columns of a data frame. Let's say I have the following data frame:
DF <- data.frame(v1 = c(1,2,3,2), v2 = c("a","a","b","b"))
then it should return that there are 3 distinct values for v1, and 2 for v2.
I tried unique(DF), but it does not work as each rows are different.
You can get unique values in column (multiple columns) from pandas DataFrame using unique() or Series. unique() functions. unique() from Series is used to get unique values from a single column and the other one is used to get from multiple columns.
Pandas series aka columns has a unique() method that filters out only unique values from a column. The first output shows only unique FirstNames. We can extend this method using pandas concat() method and concat all the desired columns into 1 single column and then find the unique of the resultant column.
Unique values are the distinct values that occur only once in the dataset or the first occurrences of duplicate values counted as unique values.
To get the unique values in multiple columns of a dataframe, we can merge the contents of those columns to create a single series object and then can call unique() function on that series object i.e. It returns the count of unique elements in multiple columns.
Or using unique
:
rapply(DF,function(x)length(unique(x)))
v1 v2
3 2
sapply(DF, function(x) length(unique(x)))
In dplyr
:
DF %>% summarise_all(funs(n_distinct(.)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With