I have a dataframe like this: <pre class="prettyprint"><code>sample_df<-data.frame( client=c('John', 'John','Mary','Mary'), date=c('2016-07-13','2016-07-13','2016-07-13','2016-07-13'), cluster=c('A','B','A','A')) #sample data frame client date cluster 1 John 2016-07-13 A 2 John 2016-07-13 B 3 Mary 2016-07-13 A 4 Mary 2016-07-13 A </code></pre> I would like to transform it into different format, which will be like: <pre class="prettyprint"><code>#ideal data frame client date cluster 1 John 2016-07-13 c('A,'B') 2 Mary 2016-07-13 A </code></pre> For the 'cluster' column, it will be a list if some client is belong to different cluster on the same date. I thought I can do it with dplyr package with commend as below <pre class="prettyprint"><code>library(dplyr) ideal_df<-sample %>% group_by(client, date) %>% summarize( #some anonymous function) </code></pre> However, I don't know how to write the anonymous function in this situation. Is there a way to transform the data into the ideal format?

We can use <code>toString</code> to concat the <code>unique</code> elements in 'cluster' together after grouping by 'client' <pre class="prettyprint"><code>r1 <- sample_df %>% group_by(client, date) %>% summarise(cluster = toString(unique(cluster))) </code></pre> Or another option would be to create a <code>list</code> column <pre class="prettyprint"><code>r2 <- sample_df %>% group_by(client, date) %>% summarise(cluster = list(unique(cluster))) </code></pre> which we can <code>unnest</code> <pre class="prettyprint"><code>library(tidyr) r2 %>% ungroup %>% unnest() </code></pre>

Group by columns and summarize a column into a list

Tags:

r

group-by

dplyr

I have a dataframe like this:

sample_df<-data.frame(
   client=c('John', 'John','Mary','Mary'),
   date=c('2016-07-13','2016-07-13','2016-07-13','2016-07-13'),
   cluster=c('A','B','A','A'))

#sample data frame
   client date         cluster
1  John   2016-07-13    A 
2  John   2016-07-13    B 
3  Mary   2016-07-13    A 
4  Mary   2016-07-13    A

I would like to transform it into different format, which will be like:

#ideal data frame
   client date         cluster
1  John   2016-07-13    c('A,'B') 
2  Mary   2016-07-13    A

For the 'cluster' column, it will be a list if some client is belong to different cluster on the same date.

I thought I can do it with dplyr package with commend as below

library(dplyr)
ideal_df<-sample %>% 
    group_by(client, date) %>% 
    summarize( #some anonymous function)

However, I don't know how to write the anonymous function in this situation. Is there a way to transform the data into the ideal format?

801

asked Jul 13 '16 09:07

Johnny Chiu

1 Answers

We can use toString to concat the unique elements in 'cluster' together after grouping by 'client'

r1 <- sample_df %>% 
         group_by(client, date) %>%
         summarise(cluster = toString(unique(cluster)))

Or another option would be to create a list column

r2 <- sample_df %>%
         group_by(client, date) %>% 
         summarise(cluster = list(unique(cluster)))

which we can unnest

library(tidyr)
r2 %>%
    ungroup %>%
     unnest()

185

answered Oct 05 '22 15:10

akrun

Related questions
                            
                                data.table in R - multiple filters using multiple keys - binary search
                            
                                How to print text and variables in a single line in r
                            
                                Match/group duplicate rows (indices)
                            
                                RStudio gives "Incorrect function" when setting git as Version control
                            
                                Embed Rmarkdown with Rmarkdown, without knitr evaluation
                            
                                dplyr count number of one specific value of variable
                            
                                dplyr::n() returns "Error: This function should not be called directly"
                            
                                Efficient calculation of var-covar matrix in R
                            
                                How to change the font of the main title in plot()
                            
                                Plotting google map with ggplot in R
                            
                                R: numeric vector becoming non-numeric after cbind of dates
                            
                                plots generated by 'plot' and 'ggplot' side-by-side
                            
                                strptime, as.POSIXct and as.Date return unexpected NA
                            
                                Reshape wide format, to multi-column long format
                            
                                as.Date(as.POSIXct()) gives the wrong date?
                            
                                How to round a time?
                            
                                How can I avoid having my R script printed every time I run it?
                            
                                rowMeans function in dplyr
                            
                                Is `if` faster than ifelse?
                            
                                Are there raw strings in R for regular expressions?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With