I have a data.frame x with the following format:
species site count
1: A 1.1 25
2: A 1.2 1152
3: A 2.1 26
4: A 3.5 1
5: A 3.7 98
---
101: B 1.2 6
102: B 1.3 10
103: B 2.1 8
104: B 2.2 8
105: B 2.3 5
I also have another data.frame area with the following format:
species area
1: A 59.7
2: B 34.4
3: C 37.7
4: D 22.8
I would like to divide the count column of data.frame x by values in the area column data.frame area when the values in the species column of each data.frame match
I have been trying to make it work with a ddply function:
density = ddply(x, "species", mutate, density = x$count/area[,2]
But I can't figure out the proper index syntax of the area[] call to select only the row which matches the values found in x$species. However, I am super new to the plyr package (and apply* functions as a whole) so this may be the completely wrong approach
I'm hoping to return a data.frame of the following format:
species site count density
1: A 1.1 25 0.419
2: A 1.2 152 2.546
3: A 2.1 26 0.436
4: A 3.5 1 0.017
5: A 3.7 98 1.641
---
101: B 1.2 6 0.174
102: B 1.3 10 0.291
103: B 2.1 8 0.233
104: B 2.2 8 0.233
105: B 2.3 5 0.145
This is easy with data.table:
library(data.table)
#converting your data to the native type for the package (by reference)
setDT(x); setDT(area)
x[area, density:=count/i.area, on="species"]
:= is the natural way to add columns in data.table (by reference, see this vignette & particularly point b) for some more about this and why it's important), so x:=y adds a column named x to your data.table and assigns it the value y.
When merging in the form X[Y,], we can think of Y as selecting the rows of X to operate on; further, when Y is a data.table, all objects in both X and Y are avaiable in j (i.e., what comes after the comma), so we could have said density:=count/area; when we want to be sure that we're referring to one of Y's columns, we prepend its name with i. so that we know we're referring to one of the columns in i, i.e., what precedes the comma. There should be a vignette on merges forthcoming.
In general, as soon as you think "match across different data sets" your instinct should be to merge. For more on data.table, see here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With