I have a relatively large dataset (16,000+ x ~31). In other words, it's large enough that I don't want to manipulate it line by line in Excel. The data is in this form:
block site day X1 X2
1 1 1 0.4 5.1
1 1 2 0.8 1.1
1 1 3 1.1 4.2
1 2 1 ... ...
1 2 2
1 2 3
2 3 1
2 3 2
2 3 3
2 4 1
2 4 2
2 4 3
As you can see, the site count is continuous but I would like a column where the site number resets with each block. For example, I would like something like this below:
block site day X1 X2 site2
1 1 1 0.4 5.1 1
1 1 2 0.8 1.1 1
1 1 3 1.1 4.2 1
1 2 1 ... ... 2
1 2 2 2
1 2 3 2
2 3 1 1
2 3 2 1
2 3 3 1
2 4 1 2
2 4 2 2
2 4 3 2
I was thinking about using the R function rle but am not sure if it will work because of complications with day. Otherwise, I would try something like:
Data$site2 <- sequence(rle(Data$block)$lengths)
Does anyone have any suggestions for adding a column counting (sequence) the number of sites within each block? If it helps, there are the same number of days (263) recorded for each site but there are a different number of sites per block.
Here's a slightly clumsy solution using plyr and ddply
:
ddply(df,.(block),transform,
site1 = rep(1:length(unique(site)),
times = rle(site)$lengths))
Or a slightly slicker version:
ddply(df,.(block),transform,site1 = as.integer(as.factor(site)))
There may be a clever way of doing this directly, though, using the various seq
, sequence
and rle
functions, but my brain is a bit hazy at the moment. If you leave this open for a bit someone will likely come along with a slick non-plyr
solution.
Using tapply could work
# Make some fake data
dat <- data.frame(block = rep(1:3, each = 4), site = rep(1:6, each = 2), val = rnorm(12))
# For each block reset the count
dat$site2 <- unlist(tapply(dat$site, dat$block, function(x){x - min(x) + 1}))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With