I have a data frame like this:
df <- data.frame(
Dim1 = c("A","A","A","A","A","A","B","B"),
Dim2 = c(100,100,100,100,200,200,100,200),
Value = sample(1:10, 8)
)
Dim1 Dim2 Value
1 A 100 3
2 A 100 6
3 A 100 7
4 A 100 4
5 A 200 8
6 A 200 9
7 B 100 2
8 B 200 10
(The Value column is just to illustrate that each row is a data point; the actual value doesn't matter.) Ultimately what I would like to do is plot the values against their index within the subset defined by Dim1 and Dim2. For this reason, I think need to append a new column containing the indices, which would look like this (added blank lines between rows to make it obvious what the subsets are):
Dim1 Dim2 Value Index
1 A 100 1 1
2 A 100 9 2
3 A 100 4 3
4 A 100 10 4
5 A 200 7 1
6 A 200 3 2
7 B 100 5 1
8 B 200 8 1
How do I do this elegantly in R? I'm coming from Python and my default approach is to for-loop over the combinations of Dim1 & Dim2, keeping track of the number of rows in each and assigning the maximum encountered so far to each row. I've been trying to figure it out but my vector-fu is weak.
This is probably going to look like cheating since I am passing a vector into a function which I then totally ignore except to get its length:
df$Index <- ave( 1:nrow(df), df$Dim1, factor( df$Dim2), FUN=function(x) 1:length(x) )
The ave
function returns a vector of the same length as its first argument but computed within categories defined by all of the factors between the first argument and the argument named FUN
. (I often forget to put the "FUN=" in for my function and get a cryptic error message along the lines of unique() applies only to vectors
, since it was trying to determine how many unique values an anonymous function possesses and it fails.
There's actually another even more compact way of expressing function(x) 1:length(x)
using the seq_along
function whch is probably safer since it would fail properly if passed a vector of length zero whereas the anonymous function form would fail improperly by returning 1:0
instead of numeric(0)
:
ave( 1:nrow(df), df$Dim1, factor( df$Dim2), FUN=seq_along )
Here you go, using data.table
:
library(data.table)
df <- data.table(
Dim1 = c("A","A","A","A","A","A","B","B"),
Dim2 = c(100,100,100,100,200,200,100,200),
Value = sample(1:10, 8)
)
df[, index := seq_len(.N), by = list(Dim1, Dim2)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With