r - How to add row index to a data frame, based on combination of factors [duplicate]

Question

I have a data frame like this:

df <- data.frame(
    Dim1 = c("A","A","A","A","A","A","B","B"),
    Dim2 = c(100,100,100,100,200,200,100,200),
    Value = sample(1:10, 8)
        )

  Dim1 Dim2 Value
1    A  100     3
2    A  100     6
3    A  100     7
4    A  100     4
5    A  200     8
6    A  200     9
7    B  100     2
8    B  200    10

(The Value column is just to illustrate that each row is a data point; the actual value doesn't matter.) Ultimately what I would like to do is plot the values against their index within the subset defined by Dim1 and Dim2. For this reason, I think need to append a new column containing the indices, which would look like this (added blank lines between rows to make it obvious what the subsets are):

  Dim1 Dim2 Value Index
1    A  100     1     1
2    A  100     9     2
3    A  100     4     3
4    A  100    10     4

5    A  200     7     1
6    A  200     3     2

7    B  100     5     1

8    B  200     8     1

How do I do this elegantly in R? I'm coming from Python and my default approach is to for-loop over the combinations of Dim1 & Dim2, keeping track of the number of rows in each and assigning the maximum encountered so far to each row. I've been trying to figure it out but my vector-fu is weak.

IRTFM · Accepted Answer

This is probably going to look like cheating since I am passing a vector into a function which I then totally ignore except to get its length:

 df$Index <- ave( 1:nrow(df), df$Dim1, factor( df$Dim2), FUN=function(x) 1:length(x) )

The ave function returns a vector of the same length as its first argument but computed within categories defined by all of the factors between the first argument and the argument named FUN. (I often forget to put the "FUN=" in for my function and get a cryptic error message along the lines of unique() applies only to vectors, since it was trying to determine how many unique values an anonymous function possesses and it fails.

There's actually another even more compact way of expressing function(x) 1:length(x) using the seq_along function whch is probably safer since it would fail properly if passed a vector of length zero whereas the anonymous function form would fail improperly by returning 1:0 instead of numeric(0):

ave( 1:nrow(df), df$Dim1, factor( df$Dim2), FUN=seq_along )

eddi · Answer

Here you go, using data.table:

library(data.table)
df <- data.table(
    Dim1 = c("A","A","A","A","A","A","B","B"),
    Dim2 = c(100,100,100,100,200,200,100,200),
    Value = sample(1:10, 8)
        )

df[, index := seq_len(.N), by = list(Dim1, Dim2)]

r - How to add row index to a data frame, based on combination of factors [duplicate]

Tags:

r

jsavn

2 Answers

IRTFM

eddi

Recent Activity

Donate For Us

r - How to add row index to a data frame, based on combination of factors [duplicate]

Tags:

r

jsavn

2 Answers

IRTFM

eddi

Related questions

Recent Activity

Donate For Us