I have a custom S3 class which is for all intents and purposes a data.frame
that has an attribute which has a value for each data frame row. For example,
my_df <- data.frame("a" = c(1, 2, 3), "b" = c(4, 5, 6))
class(my_df) <- c("foo", "data.frame")
attributes(my_df)$"row_notes" <- list("good", "bad", "good")
I have defined a subsetting function for my custom class as follows:
`[.foo` <- function(x, i, j, ...) {
result <- NextMethod()
if (!missing(i)) {
attributes(result)$"row_notes" <- attributes(x)$"row_notes"[i]
} else {
attributes(result)$"row_notes" <- attributes(x)$"row_notes"
}
return(result)
}
This seems to work just fine so far. If I do something like my_df[1:2, 2]
, the "row_notes"
attribute will get subsetted down to just c("good", "bad")
.
I would like to be able to make use of dplyr::select()
and functions like dplyr::everything()
and dplyr::all_of()
on my custom data frame, but I can't seem to use these functions without destroying the "row_notes"
attribute.
Intriguingly, the result of:
attributes(dplyr::select(my_df, dplyr::everything()))$"row_notes"
Is a two element list. I tried updating my subsetting function to debug what may be happening:
`[.foo` <- function(x, i, j, ...) {
if (!missing(i)) {
cat("i:", i, "\n")
}
if (!missing(j)) {
cat("j:", j, "\n")
}
cat("-------\n")
result <- NextMethod()
if (!missing(i)) {
attributes(result)$"row_notes" <- attributes(x)$"row_notes"[i]
} else {
attributes(result)$"row_notes" <- attributes(x)$"row_notes"
}
return(result)
}
And it turns out that for whatever reason, calling dplyr::select(df, dplyr::everything())
will at some point call [.foo
with the an i
value equal to the number of columns of the data frame, not the number of rows.
Is there a way to work around this that doesn't require telling users just not to try using dplyr::select
on this essentially data frame-like object?
To be clear, I'd simply like to have a subsetting function where whatever subsetting happens to the rows of the data frame is correspondingly applied to the "row_notes"
attribute. I will also add that this is a toy example, if it were not I would probably just keep "row_notes"
as another column in the data frame - I cannot do this in my actual work as "row_notes"
represents a variety of object types including a list of matrices.
Any help would be greatly appreciated, thank you!
The main issue is that you need the [.foo
data frame method to handle 1d subsetting (e.g. my_df[1]
and my_df["a"]
). You can see that your method currently fails in these cases and this is also why dplyr::select()
fails:
library(dplyr)
my_df <- data.frame("a" = c(1, 2, 3), "b" = c(4, 5, 6))
class(my_df) <- c("foo", "data.frame")
# changed below to make unique and to a vector for prettier printing
attributes(my_df)$"row_notes" <- c("good", "bad", "worst")
my_df[1] |> attr("row_notes")
# [1] "good"
my_df["a"] |> attr("row_notes")
# NULL
It will work if it's rewritten to check the number of arguments and that i
is not missing:
`[.foo` <- function(x, i, j, ...) {
result <- NextMethod()
if (nargs() == 2 && !missing(i)) {
attr(result, "row_notes") <- attr(x, "row_notes")
} else {
attr(result, "row_notes") <- attr(x, "row_notes")[i]
}
return(result)
}
Now it should work for base subsetting and with dplyr::select()
my_df[1] |> attr("row_notes")
# [1] "good" "bad" "worst"
my_df["a"] |> attr("row_notes")
# [1] "good" "bad" "worst"
my_df[1:2, ] |> attr("row_notes")
# [1] "good" "bad"
my_df |> select(a) |> attr("row_notes")
# [1] "good" "bad" "worst"
However, because many dplyr functions strip off custom classes and attributes, if you want to use other dplyr verbs you need to provide methods for them. See help("dplyr_extending")
. Providing a foo method for dplyr_row_slice
will get arrange()
, filter()
, slice()
(and the rest of the slice_*()
family), semi_join()
, and anti_join()
working. You also need to provide methods for group_by()
and ungroup()
if you want to use these.
restore_foo <- function(data) {
class(data) <- union("foo", class(data))
data
}
dplyr_row_slice.foo <- function(data, i, ...) {
result <- NextMethod()
attr(result, "row_notes") <- attr(data, "row_notes")[i]
restore_foo(result)
}
group_by.foo <- function(.data, ...) {
restore_foo(NextMethod())
}
ungroup.foo <- function(x, ...) {
result <- NextMethod()
attributes(result) <- attributes(x)
restore_foo(result)
}
Testing:
my_df |> slice(c(1, 3)) |> attr("row_notes")
# [1] "good" "worst"
my_df |> mutate(grp = a == 2) |> filter(row_number() == 1, .by = grp) |> attr("row_notes")
# [1] "good" "bad"
my_df |> group_by(a == 2) |> filter(row_number() == 1) |> ungroup() |> attr("row_notes")
# [1] "good" "bad"
my_df |> arrange(desc(a)) |> attr("row_notes")
# [1] "worst" "bad" "good"
my_df |> group_split(a == 2) |> lapply(attr, "row_notes")
# [[1]]
# [1] "good" "worst"
#
# [[2]]
# [1] "bad"
If you need more dplyr functions to work with your class you may need to provide further methods.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With