I would like to extract some summary statistics for a number of values in multiple columns. My data looks as follows
id pace type value abundance
51 (T) (JC) (L) 0
51 (T) (JC) (L) 0
51 (T) (JC) (H) 0
52 (T) (JC) (H) 0
52 (R) (JC) (H) 0
53 (T) (JC) (L) 1
53 (T) (JC) (H) 1
53 (R) (JC) (H) 1
53 (R) (JC) (H) 1
53 (R) (JC) (H) 1
54 (T) (BC) <blank> 0
54 (T) (BC) <blank> 0
54 (T) (BC) <blank> 0
and I am hoping for something like this
id ptype (T) (R) (L) (H) abundance
51 (JC) 3 0 2 1 0
52 (JC) 1 1 0 2 0
53 (JC) 2 3 1 4 1
54 (BC) 3 0 0 0 0
I have begun writing some code:
for (i in levels(df$id))
{
extract.event <- df[df$id==i,]# To identify each section
ppace <- table(extract.event$pace) #count table of pace
ptype <- extract.event$type[1] # extract the first line to be the type
nvalues <- table(extract.event$value) #count table of value
nabundance <- min(extract.event$abundance) #minimum of abundance
d <- cbind(ppace,ptype,forbeh,nvalues,nabundance)
but I am running into problems merging the values, especially when the nabundance prints out an empty table. I would prefer not to extract by name as there are so many names in the data frame. Any ideas? I thought it might be something to do with plyr package, but still not sure...
Thanks,
Grace
I had to rewrite your data.frame (for future reference please paste the results of dput because we hate rewriting your data) but here is my attempt. I'm guessing you are looking for something along the lines of the aggregate function:
df <- data.frame(id = as.factor(c(51,51,51,52,52,53,53,53,53,53,54,54,54)),
pace = c("(T)","(T)","(T)","(T)","(R)","(T)","(T)","(R)","(R)","(R)","(T)","(T)","(T)"),
type = c("(JC)","(JC)","(JC)","(JC)","(JC)","(JC)","(JC)","(JC)","(JC)","(JC)","(BC)","(BC)","(BC)"), value = c("(L)","(L)","(H)","(H)","(H)","(L)","(H)","(H)","(H)","(H)","<blank>","<blank>","<blank>"),
abundance = c(0,0,0,0,0,1,1,1,1,1,0,0,0))
smallnames <- colnames(do.call("cbind",as.list(aggregate(cbind(value, pace, abundance) ~ id + type, data = lapply(df, as.character), table))))
smallnames
[1] "id" "type" "(H)" "(L)" "<blank>" "(R)" "(T)" "0"
[9] "1"
df.new <- do.call("data.frame", as.list(aggregate(cbind(value, pace, abundance) ~ id + type, data = lapply(df, as.character), table)))
colnames(df.new) <- smallnames
df.new$abundance <- df.new$`1`
df.new
id type (H) (L) <blank> (R) (T) 0 1 abundance
1 54 (BC) 0 0 3 0 3 3 0 0
2 51 (JC) 1 2 0 0 3 3 0 0
3 52 (JC) 2 0 0 1 1 2 0 0
4 53 (JC) 4 1 0 3 2 0 5 5
df.final <- df.new[, -which(colnames(df.new) %in% c("<blank>","0","1"))]
df.final
id type (H) (L) (R) (T) abundance
1 54 (BC) 0 0 0 3 0
2 51 (JC) 1 2 0 3 0
3 52 (JC) 2 0 1 1 0
4 53 (JC) 4 1 3 2 5
Let me know if this is what you are looking for or if you have trouble with it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With