I'll start my question by reminding what "recycling" is, and for that, I will quote another user (re: Brian Diggs' question about Implementation of standard recycling rules):
One nice feature of R which is related to its inherent vectorized nature is the recycling rule described in An Introduction to R in Section 2.2.
Vectors occurring in the same expression need not all be of the same length. If they are not, the value of the expression is a vector with the same length as the longest vector which occurs in the expression. Shorter vectors in the expression are recycled as often as need be (perhaps fractionally) until they match the length of the longest vector. In particular a constant is simply repeated.
I would agree that recycling is a great feature and it makes life a lot easier. But I know a lot of people who also consider it dangerous, and I see their point: sometimes, it would be nice if R could warn you when, for example, you are trying to add a vector to a matrix, because it is not the most natural thing to do.
My question: is it possible to make R send warnings whenever it recycles?
Currently, R would only warn when longer object length is not a multiple of shorter object length. I'd like something that warns in all cases. I have looked into options but no luck.
Summary (multi-part answer):
Full answers:
Probably not practically. Of course, R is open source, so you can rewrite it to always warn when recycling. But since it is so fundamental to R, it would probably cause more problems than it's worth.
But, you can make functions to handle cases in which you really want to avoid recycling. To avoid recycling in your function, simply explicitly check lengths:
df <- data.frame(a = c(1:4), b = letters[1:4])
add_column <- function(df, name, x) {
df_length <- nrow(df)
x_length <- length(x)
if (df_length != x_length) {
stop("Length of vector different than nrows of dataframe")
}
df[name] <- x
return(df)
}
df <- add_column(df, "grp", "Y")
# Outputs:
# Error in add_column(df, "grp", "Y") :
# Length of vector different than nrows of dataframe
However, since length of your data is arbitrary, in many real-life cases recycling doesn't happen without a warning because it's rare* for the length of one input vector to be a perfect multiple in length of the other. (And for data frames, it is actually an error and not just a warning):
df$condition <- c("good", "bad", "so-so")
# Error in `$<-.data.frame`(`*tmp*`, condition, value = c("good", "bad", "so-so")) :
# replacement has 3 rows, data has 4
*rare, except for when the data length of the shorter vector is 1, of course (see next point).
Are you sure you want to warn all the time? When I started learning R, I remember hearing about recycling, but it was a long time before I realized just how extremely common it is. Recycling is one of the features that makes R, well, R. It's a fundamental feature that allows you to seamlessly combine what seem like individual values with "vectors" of values:
Just "single" values: a <- 1 b <- 2
a + b
# Outputs: [1] 3
Mixed "single" values and "vectors" of values:
a <- c(1, 2, 3, 4)
b <- 2
a + b # b gets recycled
# Outputs: [1] 3 4 5 6
And, I don't know about your work, but in mine sometimes we want to create a column in a dataframe with a default value:
df <- data.frame( a = c(1:4), b = letters[1:4] )
df
# Outputs:
# a b
# 1 1 a
# 2 2 b
# 3 3 c
# 4 4 d
df$group <- "X" # Here "X" gets recycled
df
# Outputs:
# a b group
# 1 1 a X
# 2 2 b X
# 3 3 c X
# 4 4 d X
p.s. I did not realize that this question was over a decade old until I was halfway through answering it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With