I have to find out the cumulative frequency, converted to percentage, of a continuous variable by factor. For example:
data <- data.frame(n = sample(1:12),
d = seq(10, 120, by = 10),
Site = rep(c("FirstSite", "SecondSite"), 6),
Plot = rep(c("Plot1", "Plot1", "Plot2", "Plot2"), 3)
)
data <- with(data, data[order(Site,Plot),])
data <- transform(data, G = ((pi * (d/2)^2) * n) / 10000)
data
n d Site Plot G
1 7 10 FirstSite Plot1 0.05497787
5 9 50 FirstSite Plot1 1.76714587
9 12 90 FirstSite Plot1 7.63407015
3 10 30 FirstSite Plot2 0.70685835
7 5 70 FirstSite Plot2 1.92422550
11 1 110 FirstSite Plot2 0.95033178
2 3 20 SecondSite Plot1 0.09424778
6 8 60 SecondSite Plot1 2.26194671
10 6 100 SecondSite Plot1 4.71238898
4 4 40 SecondSite Plot2 0.50265482
8 2 80 SecondSite Plot2 1.00530965
12 11 120 SecondSite Plot2 12.44070691
I need the cumulaive frequency of column G
by factors Plot~Site
in order to plot a geom_step ggplot of G
against d
for each plot and site.
I have achieved to compute cumulative sum of G
by factor by:
data.ss <- by(data[, "G"], data[,c("Plot", "Site")], function(x) cumsum(x))
# Gtot
(data.ss.tot <- sapply(ss, max))
[1] 9.456194 3.581416 7.068583 13.948671
Now I need to express each Plot
G
in the range [0..1] where 1 is G
tot for each Plot
. I imagine I should divide G
by its Plot
Gtot
, then apply a new cumsum
to it. How to do it?
Please note that I have to plot this cumulative frequency against d
not G
itself, so it is not a proper ecdf.
Thank you.
Thus, cumulative frequency of less than type for a particular value of the variable is obtained by cumulating or adding the frequencies of all values less than that value upto the frequency that particular value, i.e., by adding its frequency to the frequencies of all the values smaller than that value.
Now, more than type frequency can be calculated by subtracting all the proceeding frequencies from the sum of all the frequencies.
Here, the cumulative frequency for 4 is 16+18+11+15=60.
The cumulative frequency is calculated by adding each frequency from a frequency distribution table to the sum of its predecessors. The last value will always be equal to the total for all observations, since all frequencies will already have been added to the previous total.
I usually use ddply
and transform
to do this type of thing:
> data = ddply(data, c('Site', 'Plot'), transform, Gsum=cumsum(G), Gtot=sum(G))
> qplot(x=d, y=Gsum/Gtot, facets=Plot~Site, geom='step', data=data)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With