I am analyzing a set of data with many columns (almost 30 columns). I want to group data based on two columns and apply sum and mean functions to all the columns except timestamp. How would I use summarise_each on all columns except timestamp?
This is the draft code I have but it obviously not correct. Plus it generates and error because it can not apply Sum to POSIXt data type (Error: 'sum' not defined for "POSIXt" objects)
features <- dataset %>%
group_by(X, Y) %>%
summarise_each(funs(mean,sum)) %>%
arrange(TIMESTAMP)
Consider the following query: SUMMARIZECOLUMNS ( 'Sales Territory'[Category], FILTER('Customer', 'Customer' [First Name] = "Alicia") ) In this query, without a measure the groupBy columns do not contain any columns from the FILTER expression (for example, from Customer table).
df %>% summarise_at(which(sapply(df, is.numeric) & names(df) != 'Registered'), sum) If you wanted to just summarise all but one column you could do df %>% summarise_at(vars(-Registered), sum) but in this case you have to check if it's numeric also. Notes:
… Both queries produce the same result. However, you should always favor the ADDCOLUMNS version. The rule of thumb is that you should never add extended columns by using SUMMARIZE, unless it is required due to at least one of the following conditions:
The summarise_all method in R is used to affect every column of the data frame. The output data frame returns all the columns of the data frame where the specified function is applied over every column. function – The function to apply on all the data frame columns.
Try summarise_each(funs(mean,sum), -TIMESTAMP)
to exclude TIMESTAMP
from the summarisation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With