I have a data frame that contains an identifier / key column followed by several rows of value columns. I want to expand the data column by taking unique pairs of entries in the key column as the new rows and transform the value columns using binary operations on the entries from the corresponding rows.
E.g.
> Test_data
SYS dE_water_free dE_water_periodic dE_membrane_periodic RTlogKi
1 4NTJ_D294N -56.542 -56.642 NA -0.9629731
2 4NTJ_wild -171.031 -162.030 NA -0.8877264
3 4PXZ_D294N -53.430 -50.810 NA -1.1301124
4 4PXZ_wild -59.990 -57.320 NA -1.2318835
5 4PY0_D294N -77.040 -72.880 NA -1.1351579
6 4PY0_wild -79.080 -74.950 NA -1.2297302
Some of the columns may or may not contain missing value(s).
what I would like would be to take each pair of SYS entries, e.g. SYS1 SYS2 and compute a binary operation on the corresponding value rows E.g. SYS1 SYS2 dE_water_free(SYS==SYS1)-dE_water_free(SYS==SYS2) ... etc
SYS1 SYS2 dE_water_free dE_water_periodic ...etc.
1 4NTJ_D294N 4NTJ_wild 114.489 105.610
2 4NTJ_D294N 4PXZ_D294N -3.112 5.832
... etc.
I can use the function combn()
to get an array of pairs from the SYSTEM column to form the entries in SYS1 and SYS2, but I'm not sure how to use it to build the new data frame...
I know one option would be to use something like mapply and build each column individually by hand, then paste them all into a new data frame, but that seems like it will be klunky and slow and there should be a more automatic function to do this, like reshape, merge, or recast... but I can't seem to figure out how make that work.
To find all unique combinations of x , y and z , including those not present in the data, supply each variable as a separate argument: expand(df, x, y, z) . To find only the combinations that occur in the data, use nesting : expand(df, nesting(x, y, z)) . You can combine the two forms.
To add more rows permanently to an existing data frame, we need to bring in the new rows in the same structure as the existing data frame and use the rbind() function. In the example below we create a data frame with new rows and merge it with the existing data frame to create the final data frame.
To merge two data frames (datasets) horizontally, use the merge() function in the R language. To bind or combine rows in R, use the rbind() function. The rbind() stands for row binding.
expand. grid() function in R Language is used to create a data frame with all the values that can be formed with the combinations of all the vectors or factors passed to the function as argument.
outer
is well suited for this type of problem:
de_wf <- with(Test_data, setNames(dE_water_free, SYS))
outer(de_wf, de_wf, `-`)
produces:
4NTJ_D294N 4NTJ_wild 4PXZ_D294N 4PXZ_wild 4PY0_D294N 4PY0_wild
4NTJ_D294N 0.000 114.489 -3.112 3.448 20.498 22.538
4NTJ_wild -114.489 0.000 -117.601 -111.041 -93.991 -91.951
4PXZ_D294N 3.112 117.601 0.000 6.560 23.610 25.650
4PXZ_wild -3.448 111.041 -6.560 0.000 17.050 19.090
4PY0_D294N -20.498 93.991 -23.610 -17.050 0.000 2.040
4PY0_wild -22.538 91.951 -25.650 -19.090 -2.040 0.000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With