Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting mean of multiple rows based on interval dataframe in R

Let's say I have the following dataframe

df1=read.table(text="ID POSITION S1 S2 
1   1 10 10
1   2 20 0
1   3 10 0
1   4 20 0
1   5 10 50
2   1 10 0
2   2 20 10
2   3 20 10
2   4 20 10
2   5 20 10", header=TRUE)

ID and POSITION are ordered by ID then POSITION

Then another dataframe with coordinates

df2=read.table(text="ID POSITION_START POSITION_END
1  1              3
1  4              5
2  1              5", header=TRUE)

How can I get the means of S1 and S2 for those specific intervals based on ID and position from df1?

Final dataframe would be

df3=read.table(text="ID POSITION_START POSITION_END S1 S2
1  1              3            13.33  3.3
1  4              5            15  25
2  1              5            18  8", header=TRUE)

Where S1 and S2 would be means for the intervals (first row is ID=1 rows 1 to 3) I think it might be important that positions are not always equal to row number

I've tried using the iRanges library to no avail

like image 767
Gabriel G. Avatar asked Oct 30 '25 17:10

Gabriel G.


1 Answers

  • With data.table, you can try
setDT(df1)
setDT(df2)

df1[
  df2,
  .(ID, POSITION_START, POSITION_END, S1, S2),
  on = .(ID, POSITION >= POSITION_START, POSITION <= POSITION_END)
][, lapply(.SD, mean), by = ID:POSITION_END]

and you will obtain

      ID POSITION_START POSITION_END       S1        S2
   <int>          <int>        <int>    <num>     <num>
1:     1              1            3 13.33333  3.333333
2:     1              4            5 15.00000 25.000000
3:     2              1            5 18.00000  8.000000

  • With base R, you can use aggregate + subset + merge, e.g.,
aggregate(
  . ~ ID + POSITION_START + POSITION_END,
  subset(
    merge(df1, df2, all.x = TRUE),
    POSITION >= POSITION_START & POSITION <= POSITION_END,
    select = -POSITION
  ), mean
)

which gives

  ID POSITION_START POSITION_END       S1        S2
1  1              1            3 13.33333  3.333333
2  2              1            5 18.00000  8.000000
3  1              4            5 15.00000 25.000000
like image 61
ThomasIsCoding Avatar answered Nov 01 '25 07:11

ThomasIsCoding