I have a data frame mytable
with outcomes for two measurements, A and B, taken to a group of persons.
person measure outcome
1 A 3.6
2 A 2.3
3 A 4.2
1 B 3.9
2 B 3.2
3 B 2.7
I want to compute for each person the difference between the scores for A and B. That is, I want to obtain:
person outcome_diff
1 -0.3
2 -0.9
3 1.5
I searched for an answer, but I only found some concerning transformations within the levels of a factor, not across them.
I finally managed to work it out by doing:
mytable$outcome[mytable$measure=="B"] <- -1*mytable$outcome[mytable$measure=="B"]
outtable <- aggregate(outcome ~ person, data=mytable, FUN=sum)
Although it works, I wonder how to do it without messing up the original table. Furthermore, this solution is quite specific for computing a difference. What could be a more general way to achieve the same thing?
Method 1 : Using summary() method The summary() function produces an output of the frequencies of the values per level of the given factor column of the data frame in R. A summary statistics for each of the variables of this column is result in a tabular format, as an output.
The droplevels() function in R can be used to drop unused factor levels. This function is particularly useful if we want to drop factor levels that are no longer used due to subsetting a vector or a data frame. where x is an object from which to drop unused factor levels.
Attributes of Factors in R Language x: It is the vector that needs to be converted into a factor. Levels: It is a set of distinct values which are given to the input vector x. Labels: It is a character vector corresponding to the number of labels.
Factors in R are stored as a vector of integer values with a corresponding set of character values to use when the factor is displayed. The factor function is used to create a factor. The only required argument to factor is a vector of values which will be returned as a vector of factor values.
I would use plyr
:
ddply(mytable, "person", summarize,
outcome_diff = outcome[measure == "A"] -
outcome[measure == "B"])
# person outcome_diff
# 1 1 -0.3
# 2 2 -0.9
# 3 3 1.5
Under the assumption that you always have exactly two measures A
and B
and in that order, you might also just do ddply(mytable, "person", summarize, outcome_diff = -diff(outcome))
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With