Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

unconditional warnings when recycling

Tags:

r

recycle

I'll start my question by reminding what "recycling" is, and for that, I will quote another user (re: Brian Diggs' question about Implementation of standard recycling rules):

One nice feature of R which is related to its inherent vectorized nature is the recycling rule described in An Introduction to R in Section 2.2.

Vectors occurring in the same expression need not all be of the same length. If they are not, the value of the expression is a vector with the same length as the longest vector which occurs in the expression. Shorter vectors in the expression are recycled as often as need be (perhaps fractionally) until they match the length of the longest vector. In particular a constant is simply repeated.

I would agree that recycling is a great feature and it makes life a lot easier. But I know a lot of people who also consider it dangerous, and I see their point: sometimes, it would be nice if R could warn you when, for example, you are trying to add a vector to a matrix, because it is not the most natural thing to do.

My question: is it possible to make R send warnings whenever it recycles?

Currently, R would only warn when longer object length is not a multiple of shorter object length. I'd like something that warns in all cases. I have looked into options but no luck.

like image 797
flodel Avatar asked Dec 19 '25 08:12

flodel


1 Answers

Summary (multi-part answer):

  1. Practically, globally, probably not
  2. You can make functions that error instead of recycling.
  3. Due to arbitrary data length, the default behavior often warns (or for data frames, errors) appropriately.
  4. Recycling is a fundamental R feature that is so useful and prevalent that you might not really want to warn every time.

Full answers:

  1. Probably not practically. Of course, R is open source, so you can rewrite it to always warn when recycling. But since it is so fundamental to R, it would probably cause more problems than it's worth.

  2. But, you can make functions to handle cases in which you really want to avoid recycling. To avoid recycling in your function, simply explicitly check lengths:

    df <- data.frame(a = c(1:4), b = letters[1:4])
    
    add_column <- function(df, name, x) {                                           
        df_length <- nrow(df)                                                       
        x_length <- length(x)                                                       
        if (df_length != x_length) {                                                
            stop("Length of vector different than nrows of dataframe")              
        }                                                                           
        df[name] <- x                                                               
        return(df)                                                                  
    }                                                                               
    
    df <- add_column(df, "grp", "Y")
    
    # Outputs:
    # Error in add_column(df, "grp", "Y") : 
    #  Length of vector different than nrows of dataframe
    
  3. However, since length of your data is arbitrary, in many real-life cases recycling doesn't happen without a warning because it's rare* for the length of one input vector to be a perfect multiple in length of the other. (And for data frames, it is actually an error and not just a warning):

    df$condition <- c("good", "bad", "so-so")
    # Error in `$<-.data.frame`(`*tmp*`, condition, value = c("good", "bad", "so-so")) : 
    #  replacement has 3 rows, data has 4
    

*rare, except for when the data length of the shorter vector is 1, of course (see next point).

  1. Are you sure you want to warn all the time? When I started learning R, I remember hearing about recycling, but it was a long time before I realized just how extremely common it is. Recycling is one of the features that makes R, well, R. It's a fundamental feature that allows you to seamlessly combine what seem like individual values with "vectors" of values:

    Just "single" values: a <- 1 b <- 2

    a + b
    # Outputs: [1] 3
    

    Mixed "single" values and "vectors" of values:

    a <- c(1, 2, 3, 4)
    b <- 2
    
    a + b # b gets recycled
    # Outputs: [1] 3 4 5 6
    

    And, I don't know about your work, but in mine sometimes we want to create a column in a dataframe with a default value:

    df <- data.frame( a = c(1:4), b = letters[1:4] )
    df
    
    # Outputs:
    #   a b
    # 1 1 a
    # 2 2 b
    # 3 3 c
    # 4 4 d
    
    df$group <- "X" # Here "X" gets recycled
    df
    
    # Outputs:
    #   a b group
    # 1 1 a     X
    # 2 2 b     X
    # 3 3 c     X
    # 4 4 d     X
    

p.s. I did not realize that this question was over a decade old until I was halfway through answering it.

like image 136
Christopher Bottoms Avatar answered Dec 21 '25 22:12

Christopher Bottoms



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!