Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: how to compute differences based on a factor's levels?

Tags:

r

I have a data frame mytable with outcomes for two measurements, A and B, taken to a group of persons.

person measure outcome
1      A       3.6
2      A       2.3
3      A       4.2
1      B       3.9
2      B       3.2
3      B       2.7

I want to compute for each person the difference between the scores for A and B. That is, I want to obtain:

person outcome_diff
1      -0.3
2      -0.9
3       1.5

I searched for an answer, but I only found some concerning transformations within the levels of a factor, not across them.

I finally managed to work it out by doing:

mytable$outcome[mytable$measure=="B"] <- -1*mytable$outcome[mytable$measure=="B"]
outtable <- aggregate(outcome ~ person, data=mytable, FUN=sum)

Although it works, I wonder how to do it without messing up the original table. Furthermore, this solution is quite specific for computing a difference. What could be a more general way to achieve the same thing?

like image 914
DvD Avatar asked Jun 19 '13 02:06

DvD


People also ask

How do you count values per level in a factor in R?

Method 1 : Using summary() method The summary() function produces an output of the frequencies of the values per level of the given factor column of the data frame in R. A summary statistics for each of the variables of this column is result in a tabular format, as an output.

How do you exclude a level from a factor in R?

The droplevels() function in R can be used to drop unused factor levels. This function is particularly useful if we want to drop factor levels that are no longer used due to subsetting a vector or a data frame. where x is an object from which to drop unused factor levels.

What is a factor in R and what are levels of a factor?

Attributes of Factors in R Language x: It is the vector that needs to be converted into a factor. Levels: It is a set of distinct values which are given to the input vector x. Labels: It is a character vector corresponding to the number of labels.

What does the factor function do in R?

Factors in R are stored as a vector of integer values with a corresponding set of character values to use when the factor is displayed. The factor function is used to create a factor. The only required argument to factor is a vector of values which will be returned as a vector of factor values.


1 Answers

I would use plyr:

ddply(mytable, "person", summarize,
      outcome_diff = outcome[measure == "A"] -
                     outcome[measure == "B"])
#   person outcome_diff
# 1      1         -0.3
# 2      2         -0.9
# 3      3          1.5

Under the assumption that you always have exactly two measures A and B and in that order, you might also just do ddply(mytable, "person", summarize, outcome_diff = -diff(outcome)).

like image 108
flodel Avatar answered Nov 15 '22 06:11

flodel