Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

error in ddply function sum?

Tags:

r

plyr

first time posting here! I am having a problem using the ddply function. I have this table that I would like to summarize using the column "LC", and adding the values in the column "Area":

  ID LC  per     Area
1  1  7 0.29  62428.3
2  1  7 0.79 170063.3
3  1  4 0.40  86108.0
4  1  7 0.43  92566.1
5  1  6 1.00 215270.0
6  1  7 0.61 131314.7

Based on this dataframe I would expect exactly this:

LC   Area
4  86108.0
6 215270.0
7 456372.4

Applying the ddply function I get these results:

> ddply(x, 'LC', sum)
  LC       V1
1  4  86113.4
2  6 215278.0
3  7 456406.5

The formatting is perfect, but there is some discrepancies in the values. For example, class 7 should have a value of 456372.4, instead ddply reports a value of 456406.5. A difference of 34.1. All the values are miscalculated.

Can someone explain me why I am having this problem? Am I missing something here? Is my code wrong?

Thank you!

like image 642
user1896882 Avatar asked Dec 12 '12 06:12

user1896882


1 Answers

There are two problems with your approach:

  • You need to tell ddply what to sum (Area). If you don't specify the column, ddply sums the values of all columns (ID, per, and Area).
  • You could aggregate the data with the summarise argument.

This code works:

x <- read.table(text="  ID LC  per     Area
1  1  7 0.29  62428.3
2  1  7 0.79 170063.3
3  1  4 0.40  86108.0
4  1  7 0.43  92566.1
5  1  6 1.00 215270.0
6  1  7 0.61 131314.7", header = TRUE)


library(plyr)

ddply(x, .(LC), summarise, sum(Area))

The result:

  LC      ..1
1  4  86108.0
2  6 215270.0
3  7 456372.4
like image 106
Sven Hohenstein Avatar answered Oct 19 '22 08:10

Sven Hohenstein