Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can dcast be used without an aggregate function? [duplicate]

Tags:

Possible Duplicate:
This R reshaping should be simple, but

dcast from reshape2 works without a formula where there are no duplicates. Take these example data:

df <- structure(list(id = c("A", "B", "C", "A", "B", "C"), cat = c("SS",  "SS", "SS", "SV", "SV", "SV"), val = c(220L, 222L, 223L, 224L,  225L, 2206L)), .Names = c("id", "cat", "val"), class = "data.frame", row.names = c(NA,  -6L)) 

I'd like to dcast these data and just have the values tabulated, without applying any function to the value.var including the default length.

In this case, it works fine.

> dcast(df, id~cat, value.var="val")   id  SS   SV 1  A 220  224 2  B 222  225 3  C 223 2206 

But when there are duplicate variables, the fun defaults to length. Is there a way to avoid it?

df2 <- structure(list(id = c("A", "B", "C", "A", "B", "C", "C"), cat = c("SS",  "SS", "SS", "SV", "SV", "SV", "SV"), val = c(220L, 222L, 223L,  224L, 225L, 220L, 1L)), .Names = c("id", "cat", "val"), class = "data.frame", row.names = c(NA,  -7L))  > dcast(df2, id~cat, value.var="val") Aggregation function missing: defaulting to length   id SS SV 1  A  1  1 2  B  1  1 3  C  1  2 

Ideally what I'm looking for is to add a fun = NA, as in don't try to aggregate the value.var. The result I'd like when dcasting df2:

 id  SS  SV 1  A 220 224 2  B 222 225 3  C 223 220 4. C NA  1 
like image 657
Maiasaura Avatar asked Oct 11 '12 03:10

Maiasaura


1 Answers

I don't think there is a way to do it directly but we can add in an additional column which will help us out

df2 <- structure(list(id = c("A", "B", "C", "A", "B", "C", "C"), cat = c("SS",  "SS", "SS", "SV", "SV", "SV", "SV"), val = c(220L, 222L, 223L,  224L, 225L, 220L, 1L)), .Names = c("id", "cat", "val"), class = "data.frame", row.names = c(NA,  -7L))  library(reshape2) library(plyr) # Add a variable for how many times the id*cat combination has occured tmp <- ddply(df2, .(id, cat), transform, newid = paste(id, seq_along(cat))) # Aggregate using this newid and toss in the id so we don't lose it out <- dcast(tmp, id + newid ~ cat, value.var = "val") # Remove newid if we want out <- out[,-which(colnames(out) == "newid")] > out #  id  SS  SV #1  A 220 224 #2  B 222 225 #3  C 223 220 #4  C  NA   1 
like image 192
Dason Avatar answered Dec 31 '22 01:12

Dason