I have 3 data frames (rows: sites, columns:species name) of species abundances within sites. Row numbers are identical, but column numbers differ as not all species are in all three data frames. I would like to merge them into one data frame with abundances of identical species summed up. For example:
data.frame1
Sp1 Sp2 Sp3 Sp4
site1 1 2 3 1
site2 0 2 0 1
site3 1 1 1 1
data.frame2
Sp1 Sp2 Sp4
site1 0 1 2
site2 1 2 0
site3 1 1 1
data.frame3
Sp1 Sp2 Sp5 Sp6
site1 0 1 1 1
site2 1 1 1 5
site3 2 0 0 0
What I want to have is something like:
Sp1 Sp2 Sp3 Sp4 Sp5 Sp6
site1 1 4 3 3 1 1
site2 2 5 0 1 1 5
site3 4 2 1 2 0 0
I guess i'd have to work with merge, but so far my attempts have failed to get what I want.
Any help is appreciated.
I'd use plyr
's rbind.fill
like this:
pp <- cbind(names=c(rownames(df1), rownames(df2), rownames(df3)),
rbind.fill(list(df1, df2, df3)))
# names Sp1 Sp2 Sp3 Sp4 Sp5 Sp6
# 1 site1 1 2 3 1 NA NA
# 2 site2 0 2 0 1 NA NA
# 3 site3 1 1 1 1 NA NA
# 4 site1 0 1 NA 2 NA NA
# 5 site2 1 2 NA 0 NA NA
# 6 site3 1 1 NA 1 NA NA
# 7 site1 0 1 NA NA 1 1
# 8 site2 1 1 NA NA 1 5
# 9 site3 2 0 NA NA 0 0
Then, aggregate with plyr's
ddply
as follows:
ddply(pp, .(names), function(x) colSums(x[,-1], na.rm = TRUE))
# names Sp1 Sp2 Sp3 Sp4 Sp5 Sp6
# 1 site1 1 4 3 3 1 1
# 2 site2 2 5 0 1 1 5
# 3 site3 4 2 1 2 0 0
Adding to the options available, here are two more that stick with base R.
First option: Wide aggregation (sort of)
temp <- cbind(df1, df2, df3)
temp
# Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp4 Sp1 Sp2 Sp5 Sp6
# site1 1 2 3 1 0 1 2 0 1 1 1
# site2 0 2 0 1 1 2 0 1 1 1 5
# site3 1 1 1 1 1 1 1 2 0 0 0
sapply(unique(colnames(temp)),
function(x) rowSums(temp[, colnames(temp) == x, drop = FALSE]))
# Sp1 Sp2 Sp3 Sp4 Sp5 Sp6
# site1 1 4 3 3 1 1
# site2 2 5 0 1 1 5
# site3 4 2 1 2 0 0
Second option: semi-wide to long to wide
Conceptually, this is similar to Maxim. K's answer: Get the data in a long form, and it makes it much easier to manipulate things:
> temp1 <- t(cbind(df1, df2, df3))
> # You'll get a warning in the next step
> # Safe to ignore though...
> temp2 <- data.frame(var = rownames(temp), stack(data.frame(temp)))
Warning message:
In data.row.names(row.names, rowsi, i) :
some row.names duplicated: 5,6,7,8,9 --> row.names NOT used
> xtabs(values ~ ind + var, temp2)
var
ind Sp1 Sp2 Sp3 Sp4 Sp5 Sp6
site1 1 4 3 3 1 1
site2 2 5 0 1 1 5
site3 4 2 1 2 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With