I have a dataframe like this:
sample_df<-data.frame(
client=c('John', 'John','Mary','Mary'),
date=c('2016-07-13','2016-07-13','2016-07-13','2016-07-13'),
cluster=c('A','B','A','A'))
#sample data frame
client date cluster
1 John 2016-07-13 A
2 John 2016-07-13 B
3 Mary 2016-07-13 A
4 Mary 2016-07-13 A
I would like to transform it into different format, which will be like:
#ideal data frame
client date cluster
1 John 2016-07-13 c('A,'B')
2 Mary 2016-07-13 A
For the 'cluster' column, it will be a list if some client is belong to different cluster on the same date.
I thought I can do it with dplyr package with commend as below
library(dplyr)
ideal_df<-sample %>%
group_by(client, date) %>%
summarize( #some anonymous function)
However, I don't know how to write the anonymous function in this situation. Is there a way to transform the data into the ideal format?
Press "Ctrl + Space" to select it, then hold "Shift" and using the lateral arrow keys to select the other columns. After selecting all the columns you want to add together, the bar should display a formula such as "=SUM(A:C)," with the range displaying the column letter names.
The summarise_all method in R is used to affect every column of the data frame. The output data frame returns all the columns of the data frame where the specified function is applied over every column. Arguments : data – The data frame to summarise the columns of.
One great feature of the group_by function is its ability to group by more than one variable to show what the aggregated data looks like for combinations of the different variables across the response variable. All that you need to do is add a comma between the different variables in group_by .
The group_by() method in tidyverse can be used to accomplish this. When working with categorical variables, you may use the group_by() method to divide the data into subgroups based on the variable's distinct categories.
We can use toString
to concat the unique
elements in 'cluster' together after grouping by 'client'
r1 <- sample_df %>%
group_by(client, date) %>%
summarise(cluster = toString(unique(cluster)))
Or another option would be to create a list
column
r2 <- sample_df %>%
group_by(client, date) %>%
summarise(cluster = list(unique(cluster)))
which we can unnest
library(tidyr)
r2 %>%
ungroup %>%
unnest()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With