Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate ratio between all combinations of values in a row across two data sets

Tags:

r

So as mentioned in the title I have two data sets:

data 1:

structure(list(Name = structure(c(18L, 19L, 5L, 13L, 14L, 31L
), .Label = c("AMC Javelin", "Cadillac Fleetwood", "Camaro Z28", 
"Chrysler Imperial", "Datsun 710", "Dodge Challenger", "Duster 360", 
"Ferrari Dino", "Fiat 128", "Fiat X1-9", "Ford Pantera L", "Honda Civic", 
"Hornet 4 Drive", "Hornet Sportabout", "Lincoln Continental", 
"Lotus Europa", "Maserati Bora", "Mazda RX4", "Mazda RX4 Wag", 
"Merc 230", "Merc 240D", "Merc 280", "Merc 280C", "Merc 450SE", 
"Merc 450SL", "Merc 450SLC", "Pontiac Firebird", "Porsche 914-2", 
"Toyota Corolla", "Toyota Corona", "Valiant", "Volvo 142E"), class = "factor"), 
    mpg = c(145, 120, 150, 132, 110, 98), cyl = c(93, 116, 114, 
    156, 148, 167), disp = c(160, 160, 108, 258, 360, 225), hp = c(110, 
    110, 93, 110, 175, 105)), .Names = c("Name", "mpg", "cyl", 
"disp", "hp"), row.names = c(NA, 6L), class = "data.frame")

data 2:

structure(list(Name = structure(c(18L, 19L, 5L, 13L, 14L, 31L
), .Label = c("AMC Javelin", "Cadillac Fleetwood", "Camaro Z28", 
"Chrysler Imperial", "Datsun 710", "Dodge Challenger", "Duster 360", 
"Ferrari Dino", "Fiat 128", "Fiat X1-9", "Ford Pantera L", "Honda Civic", 
"Hornet 4 Drive", "Hornet Sportabout", "Lincoln Continental", 
"Lotus Europa", "Maserati Bora", "Mazda RX4", "Mazda RX4 Wag", 
"Merc 230", "Merc 240D", "Merc 280", "Merc 280C", "Merc 450SE", 
"Merc 450SL", "Merc 450SLC", "Pontiac Firebird", "Porsche 914-2", 
"Toyota Corolla", "Toyota Corona", "Valiant", "Volvo 142E"), class = "factor"), 
    mpg_1 = c(125, 133, 143, 141, 134, 238), cyl_1 = c(114, 153, 
    112, 136, 128, 155), disp_1 = c(113, 143, 144, 131, 431, 
    331), hp_1 = c(332, 221, 113, 331, 134, 151)), .Names = c("Name", 
"mpg_1", "cyl_1", "disp_1", "hp_1"), row.names = c(NA, 6L), class = "data.frame")

I would like to calculate the ratio between corresponding rows in a data sets. All the values from the row (4 columns) should be used for ratio calculation and ratio should be calculated between data sets. Using the simpler explanation:

data1[1,2] / data2[1,2]
data1[1,2] / data2[1,3]
...
data1[1,3] / data2[1,2]
...

I would like to store the results in separate data with columns labeled easily to recognize how the ratio was calculated.

like image 231
Shaxi Liver Avatar asked Jan 25 '17 13:01

Shaxi Liver


2 Answers

With lapply you could do the following. With rbind you get long format and cbind results in wide format.

Long Format:

ratioLongDF = do.call(rbind,lapply(1:ncol(DF2[,-1]),function(x) 
         data.frame(DF1[,-1]/DF2[,-1][,x],divisor=colnames(DF2[,-1])[x] ) ) )


ratioLongDF
#         mpg       cyl      disp        hp divisor
#1  1.1600000 0.7440000 1.2800000 0.8800000   mpg_1
#2  0.9022556 0.8721805 1.2030075 0.8270677   mpg_1
#3  1.0489510 0.7972028 0.7552448 0.6503497   mpg_1
#4  0.9361702 1.1063830 1.8297872 0.7801418   mpg_1
#5  0.8208955 1.1044776 2.6865672 1.3059701   mpg_1
#6  0.4117647 0.7016807 0.9453782 0.4411765   mpg_1
#7  1.2719298 0.8157895 1.4035088 0.9649123   cyl_1
#8  0.7843137 0.7581699 1.0457516 0.7189542   cyl_1
#9  1.3392857 1.0178571 0.9642857 0.8303571   cyl_1
#10 0.9705882 1.1470588 1.8970588 0.8088235   cyl_1
#11 0.8593750 1.1562500 2.8125000 1.3671875   cyl_1
#12 0.6322581 1.0774194 1.4516129 0.6774194   cyl_1
#13 1.2831858 0.8230088 1.4159292 0.9734513  disp_1
#14 0.8391608 0.8111888 1.1188811 0.7692308  disp_1
#15 1.0416667 0.7916667 0.7500000 0.6458333  disp_1
#16 1.0076336 1.1908397 1.9694656 0.8396947  disp_1
#17 0.2552204 0.3433875 0.8352668 0.4060325  disp_1
#18 0.2960725 0.5045317 0.6797583 0.3172205  disp_1
#19 0.4367470 0.2801205 0.4819277 0.3313253    hp_1
#20 0.5429864 0.5248869 0.7239819 0.4977376    hp_1
#21 1.3274336 1.0088496 0.9557522 0.8230088    hp_1
#22 0.3987915 0.4712991 0.7794562 0.3323263    hp_1
#23 0.8208955 1.1044776 2.6865672 1.3059701    hp_1
#24 0.6490066 1.1059603 1.4900662 0.6953642    hp_1 

Wide Format:

ratioWideDF = do.call(cbind,lapply(1:ncol(DF2[,-1]),function(x) {
              DF = data.frame(DF1[,-1]/DF2[,-1][,x] );
              colnames(DF)=paste0(colnames(DF),"_",colnames(DF2[,-1])[x]); 
              return(DF)} ) )   

ratioWideDF[,1:8]
#  mpg_mpg_1 cyl_mpg_1 disp_mpg_1  hp_mpg_1 mpg_cyl_1 cyl_cyl_1 disp_cyl_1  hp_cyl_1
#1 1.1600000 0.7440000  1.2800000 0.8800000 1.2719298 0.8157895  1.4035088 0.9649123
#2 0.9022556 0.8721805  1.2030075 0.8270677 0.7843137 0.7581699  1.0457516 0.7189542
#3 1.0489510 0.7972028  0.7552448 0.6503497 1.3392857 1.0178571  0.9642857 0.8303571
#4 0.9361702 1.1063830  1.8297872 0.7801418 0.9705882 1.1470588  1.8970588 0.8088235
#5 0.8208955 1.1044776  2.6865672 1.3059701 0.8593750 1.1562500  2.8125000 1.3671875
#6 0.4117647 0.7016807  0.9453782 0.4411765 0.6322581 1.0774194  1.4516129 0.6774194
like image 88
Silence Dogood Avatar answered Sep 29 '22 12:09

Silence Dogood


Here's a way using expand.grid, rep, and mapply. First, we use expand.grid to generate all the combinations of the columns we wish to iterate over. Then, we use rep to generate the rows we wish to iterate over. Then, we store these two values in a data.frame. Using the mapply function, we iterate through each row of dat_iter specifying the column and row indices we're interested in.

cols <- expand.grid(2:5, 2:5)
rows <- rep(1:6, each = 16)
dat_iter <- data.frame(rows, cols)
res <- t(mapply(x = dat_iter$rows, y = dat_iter$Var1, z = dat_iter$Var2,
       FUN = function(x, y, z) c('ratio' = data1[x, y] / data2[x, z],
                                 'd1_name' = names(data1)[y],
                                 'd2_name' = names(data2)[z],
                                 'row' = x)))

res[1:5,]

     ratio             d1_name d2_name row
[1,] "1.16"            "mpg"   "mpg_1" "1"
[2,] "0.744"           "cyl"   "mpg_1" "1"
[3,] "1.28"            "disp"  "mpg_1" "1"
[4,] "0.88"            "hp"    "mpg_1" "1"
[5,] "1.2719298245614" "mpg"   "cyl_1" "1"

Since we used mapply, you'll have to convert the first column to numeric.

like image 40
bouncyball Avatar answered Sep 29 '22 13:09

bouncyball