What I want to do: arrange data by name and class, sort by year, calculate the mean of the values and keep the valueMessage
.
Sample data:
name <- c("a", "a", "b", "b")
class <- c("c1", "c1", "c3", "c3")
year <- c("2010", "2010", "2008", "2008")
value <- c(100, 33, 100, 90)
valueMessage <-c(NA, "meh", NA, NA)
df <- data.frame(name, class, year, value, valueMessage)
df
dat <- aggregate(df$value, list(year = df$year, name = df$name, class=df$class, valueMessage=df$valueMessage), mean)
dat <- dat[with(dat, order(class, name, year)), ]
Returns this:
year name class valueMessage x
1 2010 a c1 meh 33
But I would like it to keep the NA and return something like this:
year name class valueMessage x
1 2010 a c1 meh, NA 66.5
2 2008 b c3 NA 95
The thing is that you're trying to do two different aggregations: one for your "value" column, and one for your "valueMessage" column.
If that's the case, you would have to do each separately and merge
them, or look at using a package, like "data.table" to help out.
With "data.table", you can do the following:
library(data.table)
DT <- data.table(df)
DT[, list(value = mean(value),
valueMessage = list(unique(valueMessage))),
by = list(year, name, class)]
# year name class value valueMessage
# 1: 2010 a c1 66.5 NA,meh
# 2: 2008 b c3 95.0 NA
Note that the "valueMessage" column is a list
. You could use paste
if you wanted it to be a character vector instead. In other words:
DT[, list(value = mean(value),
valueMessage = paste(unique(valueMessage), collapse = ",")),
by = list(year, name, class)]
Here, I've assumed you're only interested in unique "valueMessage" values.
If you wanted to go with base R's aggregate
, you could try something like the following:
M1 <- aggregate(value ~ year + name + class, df, mean)
M2 <- aggregate(as.character(valueMessage) ~ year + name + class,
df, unique, na.action = na.pass)
merge(M1, M2)
I've used the "formula" method because I like the output better. as.character(valueMessage)
was required because that column is presently a factor
. The output of the valueMessage
column would be, again, a list
, but you can use an anonymous function and paste
if you prefer a character vector of length
1 instead.
I like to use sqldf
, because SQL is such a nice, simple and intuitive way to do this (without need of knowing thousands of R functions and their specifics and gotchas):
require(sqldf)
sqldf('
select year, name, class, avg(value),
group_concat(distinct case when valueMessage is NULL
then "NA"
else valueMessage
end) as valueMessages
from df
group by class, name, year
')
On your modified example it will produce this output:
year name class avg(value) valueMessages
1 2010 a c1 66.5 NA,meh
2 2008 b c3 95.0 NA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With