Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split a data frame by rows, and then process the blocks?

Tags:

split

dataframe

r

I have a data frame with several columns, one of which is a factor called "site". How can I split the data frame into blocks of rows each with a unique value of "site", and then process each block with a function? The data look like this:

site year peak
ALBEN 5 101529.6
ALBEN 10 117483.4
ALBEN 20 132960.9
ALBEN 50 153251.2
ALBEN 100 168647.8
ALBEN 200 184153.6
ALBEN 500 204866.5
ALDER 5 6561.3
ALDER 10 7897.1
ALDER 20 9208.1
ALDER 50 10949.3
ALDER 100 12287.6
ALDER 200 13650.2
ALDER 500 15493.6
AMERI 5 43656.5
AMERI 10 51475.3
AMERI 20 58854.4
AMERI 50 68233.3
AMERI 100 75135.9
AMERI 200 81908.3

and I want to create a plot of year vs peak for each site.

like image 604
David Smith Avatar asked Sep 08 '09 17:09

David Smith


2 Answers

You can use isplit (from the "iterators" package) to create an iterator object that loops over the blocks defined by the site column:

require(iterators)
site.data <- read.table("isplit-data.txt",header=T) 
sites <- isplit(site.data,site.data$site)

Then you can use foreach (from the "foreach" package) to create a plot within each block:

require(foreach)
foreach(site=sites) %dopar% {
 pdf(paste(site$key[[1]],".pdf",sep=""))
 plot(site$value$year,site$value$peak,main=site$key[[1]])
 dev.off()
}

As a bonus, if you have a multiprocessor machine and call registerDoMC() first (from the "doMC" package), the loops will run in parallel, speeding things up. More details in this Revolutions blog post: Block-processing a data frame with isplit

like image 171
David Smith Avatar answered Oct 11 '22 23:10

David Smith


Another choice is use the ddply function from the ggplot2 library. But you mention you mostly want to do a plot of peak vs. year, so you could also just use qplot:

A <- read.table("example.txt",header=TRUE)
library(ggplot2)
qplot(peak,year,data=A,colour=site,geom="line",group=site)
ggsave("peak-year-comparison.png")

alt text

On the other hand, I do like David Smith's solution that allows the applying of the function to be run across several processors.

like image 33
Christopher DuBois Avatar answered Oct 11 '22 23:10

Christopher DuBois