Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Means multiple columns by multiple groups [duplicate]

I am trying to find the means, not including NAs, for multiple columns withing a dataframe by multiple groups

airquality <- data.frame(City = c("CityA", "CityA","CityA",
                                  "CityB","CityB","CityB",
                                  "CityC", "CityC"),
                         year = c("1990", "2000", "2010", "1990", 
                                  "2000", "2010", "2000", "2010"),
                         month = c("June", "July", "August",
                                   "June", "July", "August",
                                   "June", "August"),
                         PM10 = c(runif(3), rnorm(5)),
                         PM25 = c(runif(3), rnorm(5)),
                         Ozone = c(runif(3), rnorm(5)),
                         CO2 = c(runif(3), rnorm(5)))
airquality

So I get a list of the names with the number so I know which columns to select:

nam<-names(airquality)
namelist <- data.frame(matrix(t(nam)));namelist

I want to calculate the mean by City and Year for PM25, Ozone, and CO2. That means I need columns 1,2,4,6:7)

acast(datadf, year ~ city, mean, na.rm=TRUE)

But this is not really what I want because it includes the mean of something I do not need and it is not in a data frame format. I could convert it and then drop, but that seems like a very inefficient way to do it.

Is there a better way?

like image 669
Jen Avatar asked Sep 20 '17 19:09

Jen


People also ask

How do you find the mean of multiple columns?

To find the mean of multiple columns based on multiple grouping columns in R data frame, we can use summarise_at function with mean function.

Can you group by multiple columns in R?

Group By Multiple Columns in R using dplyrUse group_by() function in R to group the rows in DataFrame by multiple columns (two or more), to use this function, you have to install dplyr first using install. packages('dplyr') and load it using library(dplyr) . All functions in dplyr package take data.

Can you group by multiple columns in SQL?

We use SQL queries to group multiple columns of the database. The group by multiple columns is used to club together various records with similar (or the same) values for the specified columns.


1 Answers

We can use dplyr with summarise_at to get mean of the concerned columns after grouping by the column of interest

library(dplyr)
airquality %>%
   group_by(City, year) %>% 
   summarise_at(vars("PM25", "Ozone", "CO2"), mean)

Or using the devel version of dplyr (version - ‘0.8.99.9000’)

airquality %>%
     group_by(City, year) %>%
     summarise(across(PM25:CO2, mean))
like image 97
akrun Avatar answered Oct 24 '22 04:10

akrun