I am new to R and to this list. I hope that the question that follows is not too basic or uninformed. I have been checking in the archives for the past few hours to no avail, so here I am posting. Part of the issue is that I don't exactly know the proper terminology to use when refering to the functions that I need, which can make searching difficult. That being said, here is what I need to solve:
I have a data frame that looks like the following:
Subject Item Region RT
13 102 1 R1 1245
14 102 4 R1 1677
15 102 7 R1 1730
25 103 1 R1 815
26 103 4 R1 828
27 103 7 R1 985
1489 102 1 R2 356
1490 102 4 R2 510
1491 102 7 R2 544
1501 103 1 R2 447
1502 103 4 R2 486
1503 103 7 R2 221
...
Each subject has an RT (reaction time) for multiple regions of one item. And each subject sees multiple items.
I wish to compute outliers and then normalize them (though I'm not really going to worry about that solution in this thread). As a first step, I used some simple functions to compute the mean and SD for each Region for each subject, collapsing across items (i.e, (average of all the RTs that subject has in that region):
Mean = with(test, aggregate(RT, by = list(Subject,Region),mean, na.rm=TRUE))
SD = with(test, aggregate(RT, by = list(Subject,Region),sd, na.rm=TRUE))
I then used cbind and did some renaming to get the data all in one dataframe:
Subject Region Mean SD
1 102 R1 1143.7778 202.25530
2 102 R2 431.8611 125.84393
9 103 R1 923.0833 179.51098
10 103 R2 344.1667 146.51192
...
The issue is that I now need to associate all of the means with the correct regions for each subject. That is, I would like to generate output that looks like this (note that all Subject 102 Region R1s have the same mean and SD, but different RTs etc.):
Subject Item Region RT Mean SD
13 102 1 R1 1245 1143.7778 202.25530
14 102 4 R1 1677 1143.7778 202.25530
15 102 7 R1 1730 1143.7778 202.25530
25 103 1 R1 815 923.0833 179.51098
26 103 4 R1 828 923.0833 179.51098
27 103 7 R1 985 923.0833 179.51098
1489 102 1 R2 356 431.8611 125.84393
1490 102 4 R2 510 431.8611 125.84393
1491 102 7 R2 544 431.8611 125.84393
1501 103 1 R2 447 344.1667 146.51192
1502 103 4 R2 486 344.1667 146.51192
1503 103 7 R2 221 344.1667 146.51192
It seems that merge and cbind are not going to do the job of extending and matching one value to another. Perhaps I need to make use of melt or some function that uses a key?
I hope that someone can either point me to the relevant function for me to read up on so that I can try this on my own, or just help with some code.
Thanks for reading...
You could accomplish this task using ddply function from plyr package. Using ddply and ave function:
test <- read.table(text="
Subject Item Region RT
13 102 1 R1 1245
14 102 4 R1 1677
15 102 7 R1 1730
25 103 1 R1 815
26 103 4 R1 828
27 103 7 R1 985
1489 102 1 R2 356
1490 102 4 R2 510
1491 102 7 R2 544
1501 103 1 R2 447
1502 103 4 R2 486
1503 103 7 R2 221", header=T)
library(plyr)
ddply(test, .(Subject, Region), transform, Mean=ave(RT), SD=ave(RT, FUN=sd))
Subject Item Region RT Mean SD
1 102 1 R1 1245 1550.6667 266.03822
2 102 4 R1 1677 1550.6667 266.03822
3 102 7 R1 1730 1550.6667 266.03822
4 102 1 R2 356 470.0000 100.17984
5 102 4 R2 510 470.0000 100.17984
6 102 7 R2 544 470.0000 100.17984
7 103 1 R1 815 876.0000 94.62029
8 103 4 R1 828 876.0000 94.62029
9 103 7 R1 985 876.0000 94.62029
10 103 1 R2 447 384.6667 143.07457
11 103 4 R2 486 384.6667 143.07457
12 103 7 R2 221 384.6667 143.07457
You can check the results using aggregate function as you already did.
> with(test, aggregate(RT, by = list(Subject,Region),mean, na.rm=TRUE))
Group.1 Group.2 x
1 102 R1 1550.6667
2 103 R1 876.0000
3 102 R2 470.0000
4 103 R2 384.6667
> with(test, aggregate(RT, by = list(Subject,Region),sd, na.rm=TRUE))
Group.1 Group.2 x
1 102 R1 266.03822
2 103 R1 94.62029
3 102 R2 100.17984
4 103 R2 143.07457
As you can see both the mean and the sd aggregated by Subject and Region are put into your data.frame (test).
EDIT
If you want to deal with NA, you may want to use the following edited code:
ddply(test, .(Subject, Region), transform,
Mean=ave(RT, FUN = function(x) mean(x, na.rm=TRUE)),
SD=ave(RT, FUN=function(x) sd(x, na.rm=TRUE)))
This can be done usng sqldf
df1<-read.table(header=T,text="Subject Item Region RT
13 102 1 R1 1245
14 102 4 R1 1677
15 102 7 R1 1730
25 103 1 R1 815
26 103 4 R1 828
27 103 7 R1 985
1489 102 1 R2 356
1490 102 4 R2 510
1491 102 7 R2 544
1501 103 1 R2 447
1502 103 4 R2 486
1503 103 7 R2 221")
df2<-read.table(header=T,text="Subject Region Mean SD
1 102 R1 1143.7778 202.25530
2 102 R2 431.8611 125.84393
9 103 R1 923.0833 179.51098
10 103 R2 344.1667 146.51192")
library(sqldf)
sqldf("SELECT df1.*,df2.Mean,df2.SD from df1,df2
WHERE df1.Region=df2.Region
GROUP BY df1.Region,df1.Subject,df1.Item")
# Subject Item Region RT Mean SD
#1 102 1 R1 1245 1143.7778 202.2553
#2 102 4 R1 1677 1143.7778 202.2553
#3 102 7 R1 1730 1143.7778 202.2553
#4 103 1 R1 815 1143.7778 202.2553
#5 103 4 R1 828 1143.7778 202.2553
#6 103 7 R1 985 1143.7778 202.2553
#7 102 1 R2 356 431.8611 125.8439
#8 102 4 R2 510 431.8611 125.8439
#9 102 7 R2 544 431.8611 125.8439
#10 103 1 R2 447 431.8611 125.8439
#11 103 4 R2 486 431.8611 125.8439
#12 103 7 R2 221 431.8611 125.8439
it is ordered by Region, Subject then Item
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With