I have a data frame like this:
df <- data.frame(
Dim1 = c("A","A","A","A","A","A","B","B"),
Dim2 = c(100,100,100,100,200,200,100,200),
Value = sample(1:10, 8)
)
Dim1 Dim2 Value
1 A 100 3
2 A 100 6
3 A 100 7
4 A 100 4
5 A 200 8
6 A 200 9
7 B 100 2
8 B 200 10
(The Value column is just to illustrate that each row is a data point; the actual value doesn't matter.) Ultimately what I would like to do is plot the values against their index within the subset defined by Dim1 and Dim2. For this reason, I think need to append a new column containing the indices, which would look like this (added blank lines between rows to make it obvious what the subsets are):
Dim1 Dim2 Value Index
1 A 100 1 1
2 A 100 9 2
3 A 100 4 3
4 A 100 10 4
5 A 200 7 1
6 A 200 3 2
7 B 100 5 1
8 B 200 8 1
How do I do this elegantly in R? I'm coming from Python and my default approach is to for-loop over the combinations of Dim1 & Dim2, keeping track of the number of rows in each and assigning the maximum encountered so far to each row. I've been trying to figure it out but my vector-fu is weak.
This is probably going to look like cheating since I am passing a vector into a function which I then totally ignore except to get its length:
df$Index <- ave( 1:nrow(df), df$Dim1, factor( df$Dim2), FUN=function(x) 1:length(x) )
The ave function returns a vector of the same length as its first argument but computed within categories defined by all of the factors between the first argument and the argument named FUN. (I often forget to put the "FUN=" in for my function and get a cryptic error message along the lines of unique() applies only to vectors, since it was trying to determine how many unique values an anonymous function possesses and it fails.
There's actually another even more compact way of expressing function(x) 1:length(x) using the seq_along function whch is probably safer since it would fail properly if passed a vector of length zero whereas the anonymous function form would fail improperly by returning 1:0 instead of numeric(0):
ave( 1:nrow(df), df$Dim1, factor( df$Dim2), FUN=seq_along )
Here you go, using data.table:
library(data.table)
df <- data.table(
Dim1 = c("A","A","A","A","A","A","B","B"),
Dim2 = c(100,100,100,100,200,200,100,200),
Value = sample(1:10, 8)
)
df[, index := seq_len(.N), by = list(Dim1, Dim2)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With