I have a data frame with several columns, one of which is a factor called "site". How can I split the data frame into blocks of rows each with a unique value of "site", and then process each block with a function? The data look like this:
site year peak
ALBEN 5 101529.6
ALBEN 10 117483.4
ALBEN 20 132960.9
ALBEN 50 153251.2
ALBEN 100 168647.8
ALBEN 200 184153.6
ALBEN 500 204866.5
ALDER 5 6561.3
ALDER 10 7897.1
ALDER 20 9208.1
ALDER 50 10949.3
ALDER 100 12287.6
ALDER 200 13650.2
ALDER 500 15493.6
AMERI 5 43656.5
AMERI 10 51475.3
AMERI 20 58854.4
AMERI 50 68233.3
AMERI 100 75135.9
AMERI 200 81908.3
and I want to create a plot of year
vs peak
for each site.
You can use isplit
(from the "iterators" package) to create an iterator object that loops over the blocks defined by the site
column:
require(iterators)
site.data <- read.table("isplit-data.txt",header=T)
sites <- isplit(site.data,site.data$site)
Then you can use foreach
(from the "foreach" package) to create a plot within each block:
require(foreach)
foreach(site=sites) %dopar% {
pdf(paste(site$key[[1]],".pdf",sep=""))
plot(site$value$year,site$value$peak,main=site$key[[1]])
dev.off()
}
As a bonus, if you have a multiprocessor machine and call registerDoMC()
first (from the "doMC" package), the loops will run in parallel, speeding things up. More details in this Revolutions blog post: Block-processing a data frame with isplit
Another choice is use the ddply
function from the ggplot2
library. But you mention you mostly want to do a plot of peak vs. year, so you could also just use qplot
:
A <- read.table("example.txt",header=TRUE)
library(ggplot2)
qplot(peak,year,data=A,colour=site,geom="line",group=site)
ggsave("peak-year-comparison.png")
On the other hand, I do like David Smith's solution that allows the applying of the function to be run across several processors.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With