Can I aggregate a dataframe and retain string variables in R?

Tags:

I have a data frame of the form:

  Family Code Length Type
1      A    1     11 Alpha
2      A    3      8 Beta
3      A    3      9 Beta
4      B    4      7 Alpha
5      B    5      8 Alpha
6      C    6      2 Beta
7      C    6      5 Beta
8      C    6      4 Beta

I would like to reduce the data set to one containing unique values of Code by taking a mean of Length values, but to retain all string variables too, i.e.

  Family Code Length Type
1      A    1     11 Alpha
2      A    3    8.5 Beta
3      B    4      7 Alpha
5      B    5      8 Alpha
6      C    6   3.67 Beta

I've tried aggregate() and ddply() but these seem to replace strings with NA and I'm struggling to find a way round this.

718

asked Oct 24 '11 21:10

R_usr

1 Answers

Since Family and Type are constant within a Code group, you can "group" on those as well without changing anything when you use ddply. If your original data set was dat

ddply(dat, .(Family, Code, Type), summarize, Length=mean(Length))

gives

  Family Code  Type    Length
1      A    1 Alpha 11.000000
2      A    3  Beta  8.500000
3      B    4 Alpha  7.000000
4      B    5 Alpha  8.000000
5      C    6  Beta  3.666667

If Family and Type are not constant within a Code group, then you would need to define how to summarize/aggregate those values. In this example, I just take the single unique value:

ddply(dat, .(Code), summarize, Family=unique(Family), 
  Length=mean(Length), Type=unique(Type))

Update

Similar options using dplyr are

 library(dplyr)
 dat %>% 
     group_by(Family, Code, Type) %>%
     summarise(Length=mean(Length))

and

  dat %>%
     group_by(Code) %>%
     summarise(Family=unique(Family), Length=mean(Length), Type=unique(Type))

104

answered Sep 27 '22 22:09

Brian Diggs

Related questions
                            
                                how to scrape all files in a catalog series from the national archives (archives.gov) with R
                            
                                Mapping dates to the viridis colour scale in ggplot2
                            
                                Concatenate unique strings after groupby in R
                            
                                How can I change the labels of these buttons in DT::Datatable in R and change collors of rows?
                            
                                When should I use "which" for subsetting?
                            
                                Difference between sort(), rank(), and order() [duplicate]
                            
                                How to replace certain values in a specific rows and columns with NA in R?
                            
                                Calculating sequences based on summary counts
                            
                                How to subset a vector inside list of list
                            
                                Load an RDS file from the web (i.e. a url) directly into R?
                            
                                How to subset dataframe on lowercase values in multiple columns
                            
                                Selecting the first positive event
                            
                                Is it possible to break axis labels into 2 lines in base graphics?
                            
                                Adding multiple columns, transforming with multiple variables
                            
                                How can I take multiple vectors and recode their datatypes in R?
                            
                                How can I launch an x-window from emacs ess when running R on a server?
                            
                                grep at the beginning of the string with fixed =T in R?
                            
                                Is it possible to vectorise the sequential update of the elements of a vector in R?
                            
                                Create new lagged data.frame column
                            
                                Gotchas with logical indexing and "which" in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can I aggregate a dataframe and retain string variables in R?

Tags:

dataframe

r

aggregate

plyr

R_usr

People also ask

1 Answers

Update

Brian Diggs

Recent Activity

Donate For Us