My old code looked like this:
library(ggplot2)
gp<-ggplot(NULL,aes(x=Income))
gp<-gp+geom_density(data=dat$Male,color="blue")
gp<-gp+geom_density(data=dat$Female,color="green")
gp<-gp+geom_density(data=dat$Alien,color="red")
plot(gp) #Works
Now I have started using the excellent data.table library (instead of data.frame):
library(data.table)
cols<-c("blue","green","red")
gp<-ggplot(NULL,aes(x=Income))
dat[, list(gp+geom_density(data=.SD, color=cols[.GRP])), by=Gender]
#I even tried
dat[, list(gp<-gp+geom_density(data=.SD, color=cols[.GRP])), by=Gender]
plot(gp) #Error: No layers in plot
I am not exactly sure what is wrong, but it seems that the code I run inside J() is not being recognised in the outer scope.
How can I achieve this in an data.table idiomatic way?
ggplot2
should be used with long format data.tables in the same way as with long format data.frames:
library(data.table)
set.seed(42)
dat <- rbind(data.table(gender="male",value=rnorm(1e4)),
data.table(gender="female",value=rnorm(1e4,2,1))
)
library(ggplot2)
p1 <- ggplot(dat,aes(x=value,color=gender)) + geom_density()
print(p1)
Don't feed wide format data.frames (or data.tables) to ggplot2.
Plotting will be quite slow if you have many groups, but due to the internal magic of ggplot2
that's nothing data.table
can really help with (until Hadley implements it somehow). You can try to calulate the densities outside ggplot2
, but that will only help you so far:
set.seed(42)
dat2 <- data.table(gender=as.factor(1:5000),value=rnorm(1e7))
plotdat <- dat2[,list(x_den=density(value)$x,y_den=density(value)$y),by=gender]
p2 <- ggplot(plotdat,aes(x=x_den,y=y_den,color=gender)) + geom_line()
print(p2) #this needs some CPU time
Of course, if you have many groups you probably do the wrong kind of plot.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With