Extending data from one data frame to multiple rows in another in R

Question

I am new to R and to this list. I hope that the question that follows is not too basic or uninformed. I have been checking in the archives for the past few hours to no avail, so here I am posting. Part of the issue is that I don't exactly know the proper terminology to use when refering to the functions that I need, which can make searching difficult. That being said, here is what I need to solve:

I have a data frame that looks like the following:

   Subject Item Region   RT  
13     102    1  R1 1245  
14     102    4  R1 1677  
15     102    7  R1 1730  
25     103    1  R1  815  
26     103    4  R1  828  
27     103    7  R1  985  
1489     102    1  R2 356  
1490     102    4  R2 510  
1491     102    7  R2 544  
1501     103    1  R2 447  
1502     103    4  R2 486  
1503     103    7  R2 221  
...

Each subject has an RT (reaction time) for multiple regions of one item. And each subject sees multiple items.

I wish to compute outliers and then normalize them (though I'm not really going to worry about that solution in this thread). As a first step, I used some simple functions to compute the mean and SD for each Region for each subject, collapsing across items (i.e, (average of all the RTs that subject has in that region):

Mean = with(test, aggregate(RT, by = list(Subject,Region),mean, na.rm=TRUE))  
SD = with(test, aggregate(RT, by = list(Subject,Region),sd, na.rm=TRUE))

I then used cbind and did some renaming to get the data all in one dataframe:

Subject Region      Mean         SD  
1       102  R1 1143.7778  202.25530  
2       102  R2  431.8611  125.84393  
9       103  R1  923.0833  179.51098  
10      103  R2  344.1667  146.51192  
...

The issue is that I now need to associate all of the means with the correct regions for each subject. That is, I would like to generate output that looks like this (note that all Subject 102 Region R1s have the same mean and SD, but different RTs etc.):

Subject Item Region   RT Mean         SD  
13     102    1  R1 1245 1143.7778  202.25530  
14     102    4  R1 1677 1143.7778  202.25530  
15     102    7  R1 1730 1143.7778  202.25530  
25     103    1  R1  815 923.0833  179.51098  
26     103    4  R1  828 923.0833  179.51098  
27     103    7  R1  985 923.0833  179.51098  
1489     102    1  R2 356 431.8611  125.84393   
1490     102    4  R2 510 431.8611  125.84393  
1491     102    7  R2 544 431.8611  125.84393  
1501     103    1  R2 447 344.1667  146.51192  
1502     103    4  R2 486 344.1667  146.51192  
1503     103    7  R2 221 344.1667  146.51192

It seems that merge and cbind are not going to do the job of extending and matching one value to another. Perhaps I need to make use of melt or some function that uses a key?

I hope that someone can either point me to the relevant function for me to read up on so that I can try this on my own, or just help with some code.

Thanks for reading...

Jilber Urbina · Accepted Answer

You could accomplish this task using ddply function from plyr package. Using ddply and ave function:

test <- read.table(text="
Subject Item Region   RT  
13     102    1  R1 1245  
14     102    4  R1 1677  
15     102    7  R1 1730  
25     103    1  R1  815  
26     103    4  R1  828  
27     103    7  R1  985  
1489     102    1  R2 356  
1490     102    4  R2 510  
1491     102    7  R2 544  
1501     103    1  R2 447  
1502     103    4  R2 486  
1503     103    7  R2 221", header=T)

library(plyr)
ddply(test, .(Subject, Region), transform, Mean=ave(RT), SD=ave(RT, FUN=sd))
   Subject Item Region   RT      Mean        SD
1      102    1     R1 1245 1550.6667 266.03822
2      102    4     R1 1677 1550.6667 266.03822
3      102    7     R1 1730 1550.6667 266.03822
4      102    1     R2  356  470.0000 100.17984
5      102    4     R2  510  470.0000 100.17984
6      102    7     R2  544  470.0000 100.17984
7      103    1     R1  815  876.0000  94.62029
8      103    4     R1  828  876.0000  94.62029
9      103    7     R1  985  876.0000  94.62029
10     103    1     R2  447  384.6667 143.07457
11     103    4     R2  486  384.6667 143.07457
12     103    7     R2  221  384.6667 143.07457

You can check the results using aggregate function as you already did.

> with(test, aggregate(RT, by = list(Subject,Region),mean, na.rm=TRUE))  
  Group.1 Group.2         x
1     102      R1 1550.6667
2     103      R1  876.0000
3     102      R2  470.0000
4     103      R2  384.6667
> with(test, aggregate(RT, by = list(Subject,Region),sd, na.rm=TRUE))
  Group.1 Group.2         x
1     102      R1 266.03822
2     103      R1  94.62029
3     102      R2 100.17984
4     103      R2 143.07457

As you can see both the mean and the sd aggregated by Subject and Region are put into your data.frame (test).

EDIT

If you want to deal with NA, you may want to use the following edited code:

ddply(test, .(Subject, Region), transform, 
      Mean=ave(RT, FUN = function(x) mean(x, na.rm=TRUE)),
      SD=ave(RT, FUN=function(x) sd(x, na.rm=TRUE)))

shhhhimhuntingrabbits · Answer

This can be done usng sqldf

df1<-read.table(header=T,text="Subject Item Region   RT  
13     102    1  R1 1245  
14     102    4  R1 1677  
15     102    7  R1 1730  
25     103    1  R1  815  
26     103    4  R1  828  
27     103    7  R1  985  
1489     102    1  R2 356  
1490     102    4  R2 510  
1491     102    7  R2 544  
1501     103    1  R2 447  
1502     103    4  R2 486  
1503     103    7  R2 221")

df2<-read.table(header=T,text="Subject Region      Mean         SD  
1       102  R1 1143.7778  202.25530  
2       102  R2  431.8611  125.84393  
9       103  R1  923.0833  179.51098  
10      103  R2  344.1667  146.51192")  

library(sqldf)
sqldf("SELECT df1.*,df2.Mean,df2.SD from df1,df2
        WHERE df1.Region=df2.Region 
         GROUP BY df1.Region,df1.Subject,df1.Item")

#   Subject Item Region   RT      Mean       SD
#1      102    1     R1 1245 1143.7778 202.2553
#2      102    4     R1 1677 1143.7778 202.2553
#3      102    7     R1 1730 1143.7778 202.2553
#4      103    1     R1  815 1143.7778 202.2553
#5      103    4     R1  828 1143.7778 202.2553
#6      103    7     R1  985 1143.7778 202.2553
#7      102    1     R2  356  431.8611 125.8439
#8      102    4     R2  510  431.8611 125.8439
#9      102    7     R2  544  431.8611 125.8439
#10     103    1     R2  447  431.8611 125.8439
#11     103    4     R2  486  431.8611 125.8439
#12     103    7     R2  221  431.8611 125.8439

it is ordered by Region, Subject then Item

Extending data from one data frame to multiple rows in another in R

Tags:

dataframe

r

D T

2 Answers

Jilber Urbina

shhhhimhuntingrabbits

Recent Activity

Donate For Us

Extending data from one data frame to multiple rows in another in R

Tags:

dataframe

r

D T

2 Answers

Jilber Urbina

shhhhimhuntingrabbits

Related questions

Recent Activity

Donate For Us