Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Data Frame Manipulations

Tags:

dataframe

r

I have two data frames:

>df1
type  id1  id2  id3  count1  count2  count3
a     x1   y1   z1   10      20      0
b     x2   y2   z2   20      0       30
c     x3   y3   z3   10      10      10

>df2
id   prop
x1   10
x2   5
x3   100
y1   0
y2   50
y3   80
z1   10
z2   20
z3   30

count* are like weights. So, finally I want to join the table such that TotalProp is weighted sum of prop and counts

For e.g. for the first row in df1 TotalProp = 10(prop for x1) * 10(count1) + 0(Prop for y1) * 20(count2) + 10(Prop for z1) * 0(count3) = 100

Hence my final table looks like this:

>result
type  id1  id2  id3  TotalProp
a     x1   y1   z1   100
b     x2   y2   z2   700
c     x3   y3   z3   2100

Any idea how can I do this?

Thanks.

like image 985
Rachit Agrawal Avatar asked Nov 10 '22 14:11

Rachit Agrawal


1 Answers

One line solution first and then explanation using multiple steps

df1
##   type id1 id2 id3 count1 count2 count3
## 1    a  x1  y1  z1     10     20      0
## 2    b  x2  y2  z2     20      0     30
## 3    c  x3  y3  z3     10     10     10


df2
##    id prop
## x1 x1   10
## x2 x2    5
## x3 x3  100
## y1 y1    0
## y2 y2   50
## y3 y3   80
## z1 z1   10
## z2 z2   20
## z3 z3   30

rownames(df2) <- df2$id

result <- data.frame(type = df1$type, TotalProp = rowSums(matrix(df2[unlist(df1[, c("id1", "id2", "id3")]), "prop"], nrow = nrow(df1)) * as.matrix(df1[, 
    c("count1", "count2", "count3")])))

result
##   type TotalProp
## 1    a       100
## 2    b       700
## 3    c      2100

Stepwise explanation

First we get all the id values in a vector for which we want to fetch corresponding prop values from df2

Step 1

unlist(df1[, c("id1", "id2", "id3")])
## id11 id12 id13 id21 id22 id23 id31 id32 id33 
## "x1" "x2" "x3" "y1" "y2" "y3" "z1" "z2" "z3" 

Step 2

We name the rows of df2 with df2$id. rownames(df2) <- df2$id

Step 3 Then using result from step 1, we get corresponding prop values

df2[unlist(df1[, c("id1", "id2", "id3")]), "prop"]
## [1]  10   5 100   0  50  80  10  20  30

Step 4 Convert the vector from step 3 back to 2 dimensional form

matrix(df2[unlist(df1[, c("id1", "id2", "id3")]), "prop"], nrow = nrow(df1))
##      [,1] [,2] [,3]
## [1,]   10    0   10
## [2,]    5   50   20
## [3,]  100   80   30

Step 5 Multiply result of Step 4 with counts from df1

as.matrix(df1[, c("count1", "count2", "count3")])
##      count1 count2 count3
## [1,]     10     20      0
## [2,]     20      0     30
## [3,]     10     10     10

matrix(df2[unlist(df1[, c("id1", "id2", "id3")]), "prop"], nrow = nrow(df1)) * 
       as.matrix(df1[, c("count1", "count2", "count3")])
##      count1 count2 count3
## [1,]    100      0      0
## [2,]    100      0    600
## [3,]   1000    800    300

Step 6

Apply rowSums to result from step 5 to get desired TotalProp values

rowSums(matrix(df2[unlist(df1[,c('id1','id2','id3')]),'prop'], nrow=nrow(df1)) * as.matrix(df1[,c('count1', 'count2', 'count3')]))

## [1]  100  700 2100
like image 139
CHP Avatar answered Nov 29 '22 07:11

CHP