Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add a counting column based on values in other columns in R

Tags:

r

sequence

I have a relatively large dataset (16,000+ x ~31). In other words, it's large enough that I don't want to manipulate it line by line in Excel. The data is in this form:

block  site     day  X1   X2
1      1        1    0.4  5.1 
1      1        2    0.8  1.1
1      1        3    1.1  4.2
1      2        1    ...  ...
1      2        2
1      2        3
2      3        1
2      3        2
2      3        3
2      4        1
2      4        2
2      4        3

As you can see, the site count is continuous but I would like a column where the site number resets with each block. For example, I would like something like this below:

block  site     day  X1   X2    site2
1      1        1    0.4  5.1   1
1      1        2    0.8  1.1   1
1      1        3    1.1  4.2   1
1      2        1    ...  ...   2
1      2        2               2
1      2        3               2
2      3        1               1
2      3        2               1
2      3        3               1
2      4        1               2
2      4        2               2
2      4        3               2

I was thinking about using the R function rle but am not sure if it will work because of complications with day. Otherwise, I would try something like:

Data$site2 <- sequence(rle(Data$block)$lengths)

Does anyone have any suggestions for adding a column counting (sequence) the number of sites within each block? If it helps, there are the same number of days (263) recorded for each site but there are a different number of sites per block.

like image 608
djhocking Avatar asked Dec 18 '11 04:12

djhocking


2 Answers

Here's a slightly clumsy solution using plyr and ddply:

ddply(df,.(block),transform,
                  site1 = rep(1:length(unique(site)),
                             times = rle(site)$lengths))

Or a slightly slicker version:

ddply(df,.(block),transform,site1 = as.integer(as.factor(site)))

There may be a clever way of doing this directly, though, using the various seq, sequence and rle functions, but my brain is a bit hazy at the moment. If you leave this open for a bit someone will likely come along with a slick non-plyr solution.

like image 101
joran Avatar answered Oct 11 '22 15:10

joran


Using tapply could work

# Make some fake data
dat <- data.frame(block = rep(1:3, each = 4), site = rep(1:6, each  = 2), val = rnorm(12))
# For each block reset the count
dat$site2 <- unlist(tapply(dat$site, dat$block, function(x){x - min(x) + 1}))
like image 34
Dason Avatar answered Oct 11 '22 13:10

Dason