Unexpected behaviour of "formula" with "data.table" in R

Question

I am trying to dynamically form a formula to use in dynlm. I encounter a behaviour of function that I do not understand, which can be seen from this code:

library(data.table)
dt_test <- data.table("a"=rnorm(10), "b"=1:5)

dt_test[, .(.(
   formula("z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + tt + tt2")
 )), .(b)]

The code above is expected to produce (identical) formulas for each value of b. This formula is enclosed in .(.(...)) to return a list, just so that it can be properly stored in a column from the original data.table.

However, the formula returned does not match the string originally provided, but adds a comma between the + and tt, as you can see from the ouput:

       b                                                                           V1
   <int>                                                                       <list>
1:     1 z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + ,    tt + tt2
2:     2 z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + ,    tt + tt2
3:     3 z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + ,    tt + tt2
4:     4 z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + ,    tt + tt2
5:     5 z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + ,    tt + tt2

Essentially, it adds a comma where there is none. It does so even re-arranging the terms of the sum, but it stops doing it if I erase q_val, for example. The same goes for as.formula.

I would like to understand what is going on and avoid it.

jpsmith · Accepted Answer

This is just a cosmetic printing issue due to the way R treats long formulas:

If you run:

formula(paste0("z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + tt + tt2"))

You will see R will default to printing it to 2 lines, cutting it off at "tt + tt2" (no matter how wide the console is):

#z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + 
#    tt + tt2

This is somewhat meaningful to the way R cosmetically shows you the formula - if you run deparse, it will output a character vector of length 2:

deparse(formula(paste0("z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + tt + tt2")))

# [1] "z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + "
# [2] "    tt + tt2"

However, assigning your original code as df_formulas, you will see that it stores the formula as normal:

df_formulas <- dt_test[, .(.(
  formula("z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + tt + tt2")
)), .(b)]

dt_formulas[[2]]

# [[1]]
# z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 +
#   tt + tt2
# <environment: 0x7fa96ff6ffd8>
#   
# [[2]]
# z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 +
#   tt + tt2
# <environment: 0x7fa96ff6ffd8>
# ....

As you mentioned, this is also why you don't see the comma if you remove some of the variables in the formula code - it has nothing to do with what specifically you are removing, you're simply reducing the length sufficiently to avoid the automatic line break.

jay.sf · Answer

Maybe you want add a list column, sth like this:

> library(data.table)
> dt_test <- data.table("a"=rnorm(10), "b"=1:5)
> dt_test[, x := list(rep(list(as.formula("z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + tt + tt2")), .N))]
> dt_test$x[[1]]
z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + 
    tt + tt2
<environment: 0x56169e0ce3c8>

This looks weird during printing,

> dt_test |> head(2)
            a     b                                                                            x
        <num> <int>                                                                       <list>
1: -0.5439367     1 z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + ,    tt + tt2
2:  0.1078461     2 z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + ,    tt + tt2

but actually is a formula:

> class(dt_test$x[[1]])
[1] "formula"

You might adapt that to your dynamic .(b) stuff. Not sure if you need rep then; data.table doesn't like recycling, so it's needed in this example.

Unexpected behaviour of "formula" with "data.table" in R

Tags:

function

r

data.table

oibaFox

2 Answers

jpsmith

jay.sf

Recent Activity

Donate For Us

Unexpected behaviour of "formula" with "data.table" in R

Tags:

function

r

data.table

oibaFox

2 Answers

jpsmith

jay.sf

Related questions

Recent Activity

Donate For Us