Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

r - How to add row index to a data frame, based on combination of factors [duplicate]

Tags:

r

I have a data frame like this:

df <- data.frame(
    Dim1 = c("A","A","A","A","A","A","B","B"),
    Dim2 = c(100,100,100,100,200,200,100,200),
    Value = sample(1:10, 8)
        )

  Dim1 Dim2 Value
1    A  100     3
2    A  100     6
3    A  100     7
4    A  100     4
5    A  200     8
6    A  200     9
7    B  100     2
8    B  200    10

(The Value column is just to illustrate that each row is a data point; the actual value doesn't matter.) Ultimately what I would like to do is plot the values against their index within the subset defined by Dim1 and Dim2. For this reason, I think need to append a new column containing the indices, which would look like this (added blank lines between rows to make it obvious what the subsets are):

  Dim1 Dim2 Value Index
1    A  100     1     1
2    A  100     9     2
3    A  100     4     3
4    A  100    10     4

5    A  200     7     1
6    A  200     3     2

7    B  100     5     1

8    B  200     8     1

How do I do this elegantly in R? I'm coming from Python and my default approach is to for-loop over the combinations of Dim1 & Dim2, keeping track of the number of rows in each and assigning the maximum encountered so far to each row. I've been trying to figure it out but my vector-fu is weak.

like image 985
jsavn Avatar asked Apr 18 '13 20:04

jsavn


2 Answers

This is probably going to look like cheating since I am passing a vector into a function which I then totally ignore except to get its length:

 df$Index <- ave( 1:nrow(df), df$Dim1, factor( df$Dim2), FUN=function(x) 1:length(x) )

The ave function returns a vector of the same length as its first argument but computed within categories defined by all of the factors between the first argument and the argument named FUN. (I often forget to put the "FUN=" in for my function and get a cryptic error message along the lines of unique() applies only to vectors, since it was trying to determine how many unique values an anonymous function possesses and it fails.

There's actually another even more compact way of expressing function(x) 1:length(x) using the seq_along function whch is probably safer since it would fail properly if passed a vector of length zero whereas the anonymous function form would fail improperly by returning 1:0 instead of numeric(0):

ave( 1:nrow(df), df$Dim1, factor( df$Dim2), FUN=seq_along )
like image 91
IRTFM Avatar answered Oct 20 '22 23:10

IRTFM


Here you go, using data.table:

library(data.table)
df <- data.table(
    Dim1 = c("A","A","A","A","A","A","B","B"),
    Dim2 = c(100,100,100,100,200,200,100,200),
    Value = sample(1:10, 8)
        )

df[, index := seq_len(.N), by = list(Dim1, Dim2)]
like image 32
eddi Avatar answered Oct 20 '22 23:10

eddi