I was wondering if someone had an nice way to achieve this. I have a data frame where each observation (=item), belonging to a particular group (= condition) has a given value:
# Create sample data.
item = rep(1:3,2) #6 items
condition = c(rep("control",3), rep("related",3)) #2 conditions
value = c(10,11,12,20,21,22) #6 values
df = data.frame(item, condition, value)
item condition value
1 1 control 10
2 2 control 11
3 3 control 12
4 1 related 20
5 2 related 21
6 3 related 22
I also have a look-up table, which contains the mean of each group:
# Create lookup table.
condition = c("control", "related")
mean = c(11,21)
table = data.frame(condition, mean)
condition mean
1 control 11
2 related 21
I want to modify my original data frame, such that it contains a new column, label
, which says "low" if the value of the item is lower than the group mean, and "high" otherwise. It should look like this:
# How the output should look like.
# If the item value is less than the group mean, write "low". Write "high" otherwise.
item = rep(1:3,2)
condition = c(rep("control",3), rep("related",3))
value = c(10,11,12,20,21,22)
label = c(rep(c("low", "high", "high"),2))
output = data.frame(item, condition, value, label)
item condition value label
1 1 control 10 low
2 2 control 11 high
3 3 control 12 high
4 1 related 20 low
5 2 related 21 high
6 3 related 22 high
If this were a matter of just copying the group mean to my original data frame, I would use merge
. But what I need is to take the group mean into account to write a new label for each item that says "low" or "high" depending on the group mean.
One thing that I tried was to first merge my data frame with the table, and then use ifelse
to compare the value column with the mean column. This works, but I also end up with a mean column in my data frame, which I don't need (I only need the label column). Of course, I could delete the mean column by hand, but it seems clunky. So I was wondering: does someone know a better/more elegant solution?
Thanks a bunch!
Here are some alternatives. (1) and (2) only use base R and (2), (3) and (5) do not create a mean column only to be explicitly deleted. In (1), (3) and (4) we used left joins although inner joins would have given the same result with this data and in the case of (1a) allow us to write (1) as a single line.
1) merge
m <- merge(df, table, all.x = TRUE)
transform(m, label = ifelse(value < mean, "low", "high"), mean = NULL)
giving:
item condition value label
1 1 control 10 low
2 2 control 11 high
3 3 control 12 high
4 1 related 20 low
5 2 related 21 high
6 3 related 22 high
1a) With an inner join it could be shortened to:
transform(merge(df, table), label = ifelse(value < mean, "low", "high"), mean = NULL)
2) match
transform(df,
label = ifelse(value < table$mean[match(condition, table$condition)], "low", "high")
)
giving the same.
3) sqldf
library(sqldf)
sqldf("select
df.*,
case when value < mean
then 'low'
else 'high'
end label
from df
left join 'table' using (condition)")
4) dplyr
library(dplyr)
df %>%
left_join(table) %>%
mutate(label = ifelse(value < mean, "low", "high")) %>%
select(- mean)
5) data.table
library(data.table)
dt <- as.data.table(df)
setkey(dt, "condition")
dt[table, label := ifelse(value < mean, "low", "high")]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With