Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to group rows and get their cell associations layed out in a list form in r?

I can't phrase the question well due to my limited English proficiency. However I wanted to see cell association of my dataframe. Meaning, let's say I pick "row a" then how would "rows a"s many associations, and then make a dataframe where each row corresponds to list of its association? I know several dplyr functions like group_by or group_splits, however I couldn't get far with those. Here is my goal:

ex_df  <- data.frame(Tracts= c(500, 200, 420, 317, 783, 200, 200, 500, 317, 783),
                    Cluster = c(1, 2, 3, 4, 4, 5,1, 2 ,4,3))
#gives:
#       Tracts Cluster
# 1     500       1
# 2     200       2
# 3     420       3
# 4     317       4
# 5     783       4
# 6     200       5
# 7     200       1
# 8     500       2
# 9     317       4
# 10    783       3

# Now how do I get the dataframe where I can get list (or character vector is ok) 
# of cell associations? Something like this:

#Required output:
#    Tracts Contained_cluster
# 1   500       1,2
# 2   200       1,2,5
# 3   420       3
# 4   317       4
# 5   783       3,4

I couldn't make a proper search due to lack of English proficiency. If this question is duplicate, please do let me know. Also, if you can re-phrase the question, please feel free. Thankyou.

like image 757
CaseebRamos Avatar asked Mar 17 '20 14:03

CaseebRamos


2 Answers

Using aggregate we can create a comma-separated unique values for each Tracts.

aggregate(Cluster~Tracts, ex_df, function(x) toString(sort(unique(x))))

#  Tracts Cluster
#1    200 1, 2, 5
#2    317       4
#3    420       3
#4    500    1, 2
#5    783    3, 4

Or same using dplyr :

library(dplyr)
ex_df %>% group_by(Tracts) %>% summarise(Cluster = toString(sort(unique(Cluster))))
like image 179
Ronak Shah Avatar answered Oct 01 '22 04:10

Ronak Shah


EDIT : I didn't see the need of unique Tracts, the answer is to use summarise instead of mutate.

An other answer using dplyr, (Ronak Shah is really fast indeed ^^) :

ex_df  <- data.frame(Tracts= c(500, 200, 420, 317, 783, 200, 200, 500, 317, 783),
             Cluster = c(1, 2, 3, 4, 4, 5,1, 2 ,4,3))

suppressPackageStartupMessages( library(dplyr) )

# --- If one Tract needed 
ex_df %>% 
    group_by(Tracts) %>% 
    summarise(Cluster = paste(Cluster, collapse = ", ")) %>%  
    arrange(Tracts)

# --- If modification per rows
ex_df %>% 
    group_by(Tracts) %>% 
    mutate(Cluster = paste(Cluster, collapse = ", ")) %>% 
    ungroup() %>% 
    arrange(Tracts)
#> # A tibble: 10 x 2
#>    Tracts Cluster
#>     <dbl> <chr>  
#>  1    200 2, 5, 1
#>  2    200 2, 5, 1
#>  3    200 2, 5, 1
#>  4    317 4, 4   
#>  5    317 4, 4   
#>  6    420 3      
#>  7    500 1, 2   
#>  8    500 1, 2   
#>  9    783 4, 3   
#> 10    783 4, 3
like image 26
cbo Avatar answered Oct 01 '22 02:10

cbo