Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

show unique values for each column

Tags:

r

dplyr

purrr

I am trying to create a data-frame of the column type and unique variables for each column.

I am able to get column type in the desired data-frame format using map(df, class) %>% bind_rows() %>% gather(key = col_name, value = col_class), but unable to get the unique variables to become a data-frame instead of a list.

Below is a small data-frame and code that gets the unique variables in a list, but not a data frame. Ideally, I could do this in one (map) function, but if I have to join them, it would not be a big deal.


df <- data.frame(v1 = c(1,2,3,2), v2 = c("a","a","b","b"))

library(tidyverse)

map(df, class) %>% bind_rows() %>% gather(key = col_name, value = col_class)

map(df, unique)

When I try to use the same method on the map(df, unique) as on the map(df, class) I get the following error: Error: Argument 2 must be length 3, not 2 which is expected, but I am not sure how to get around it.

like image 953
alexb523 Avatar asked Dec 13 '22 11:12

alexb523


2 Answers

The number of unique values are different in those two columns. You need to reduce them to a single element.

df2 <- map(df, ~str_c(unique(.x),collapse = ",")) %>% 
    bind_rows() %>% 
    gather(key = col_name, value = col_unique)
> df2
# A tibble: 2 x 2
  col_name col_class
  <chr>    <chr>    
1 v1       1,2,3    
2 v2       a,b   
like image 134
yusuzech Avatar answered Jan 02 '23 00:01

yusuzech


We could use map_df and get the class and unique values from each column into one tibble. Since every column would have variables of different type, we need to bring them in one common class to bind the data together in one dataframe.

purrr::map_df(df,~tibble::tibble(class = class(.), value = as.character(unique(.))))

#  class  value
#  <chr>  <chr>
#1 numeric 1    
#2 numeric 2    
#3 numeric 3    
#4 factor  a    
#5 factor  b    

Or if you want to have only one value for every column, we could do

map_df(df, ~tibble(class = class(.), value = toString(unique(.))))

#  class   value  
#  <chr>   <chr>  
#1 numeric 1, 2, 3
#2 factor  a, b   

Same in base R using lapply

do.call(rbind, lapply(df, function(x) 
       data.frame(class = class(x), value = as.character(unique(x)))))

and

do.call(rbind, lapply(df, function(x) 
        data.frame(class = class(x), value = toString(unique(x)))))
like image 44
Ronak Shah Avatar answered Jan 02 '23 00:01

Ronak Shah