first time posting here! I am having a problem using the ddply function. I have this table that I would like to summarize using the column "LC", and adding the values in the column "Area":
ID LC per Area
1 1 7 0.29 62428.3
2 1 7 0.79 170063.3
3 1 4 0.40 86108.0
4 1 7 0.43 92566.1
5 1 6 1.00 215270.0
6 1 7 0.61 131314.7
Based on this dataframe I would expect exactly this:
LC Area
4 86108.0
6 215270.0
7 456372.4
Applying the ddply function I get these results:
> ddply(x, 'LC', sum)
LC V1
1 4 86113.4
2 6 215278.0
3 7 456406.5
The formatting is perfect, but there is some discrepancies in the values. For example, class 7 should have a value of 456372.4, instead ddply reports a value of 456406.5. A difference of 34.1. All the values are miscalculated.
Can someone explain me why I am having this problem? Am I missing something here? Is my code wrong?
Thank you!
There are two problems with your approach:
ddply
what to sum (Area
). If you don't specify the column, ddply
sums the values of all columns (ID
, per
, and Area
).summarise
argument.This code works:
x <- read.table(text=" ID LC per Area
1 1 7 0.29 62428.3
2 1 7 0.79 170063.3
3 1 4 0.40 86108.0
4 1 7 0.43 92566.1
5 1 6 1.00 215270.0
6 1 7 0.61 131314.7", header = TRUE)
library(plyr)
ddply(x, .(LC), summarise, sum(Area))
The result:
LC ..1
1 4 86108.0
2 6 215270.0
3 7 456372.4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With