Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dividing values in a column of a data frame by values from a different data frame when row values match

Tags:

r

plyr

I have a data.frame x with the following format:

     species      site  count
1:         A       1.1     25
2:         A       1.2   1152
3:         A       2.1     26
4:         A       3.5      1
5:         A       3.7     98
---                         
101:       B       1.2      6
102:       B       1.3     10
103:       B       2.1      8
104:       B       2.2      8
105:       B       2.3      5

I also have another data.frame area with the following format:

      species    area
1:          A    59.7
2:          B    34.4
3:          C    37.7
4:          D    22.8

I would like to divide the count column of data.frame x by values in the area column data.frame area when the values in the species column of each data.frame match

I have been trying to make it work with a ddply function:

density = ddply(x, "species", mutate, density = x$count/area[,2]

But I can't figure out the proper index syntax of the area[] call to select only the row which matches the values found in x$species. However, I am super new to the plyr package (and apply* functions as a whole) so this may be the completely wrong approach

I'm hoping to return a data.frame of the following format:

     species      site  count   density
1:         A       1.1     25     0.419
2:         A       1.2    152     2.546
3:         A       2.1     26     0.436
4:         A       3.5      1     0.017
5:         A       3.7     98     1.641
---                         
101:       B       1.2      6     0.174
102:       B       1.3     10     0.291
103:       B       2.1      8     0.233
104:       B       2.2      8     0.233
105:       B       2.3      5     0.145
like image 948
C. Denney Avatar asked Dec 15 '22 11:12

C. Denney


1 Answers

This is easy with data.table:

library(data.table)
#converting your data to the native type for the package (by reference)
setDT(x); setDT(area) 
x[area, density:=count/i.area, on="species"]

:= is the natural way to add columns in data.table (by reference, see this vignette & particularly point b) for some more about this and why it's important), so x:=y adds a column named x to your data.table and assigns it the value y.

When merging in the form X[Y,], we can think of Y as selecting the rows of X to operate on; further, when Y is a data.table, all objects in both X and Y are avaiable in j (i.e., what comes after the comma), so we could have said density:=count/area; when we want to be sure that we're referring to one of Y's columns, we prepend its name with i. so that we know we're referring to one of the columns in i, i.e., what precedes the comma. There should be a vignette on merges forthcoming.

In general, as soon as you think "match across different data sets" your instinct should be to merge. For more on data.table, see here.

like image 178
MichaelChirico Avatar answered Jan 30 '23 23:01

MichaelChirico