I am trying to dynamically form a formula to use in dynlm
. I encounter a behaviour of function
that I do not understand, which can be seen from this code:
library(data.table)
dt_test <- data.table("a"=rnorm(10), "b"=1:5)
dt_test[, .(.(
formula("z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + tt + tt2")
)), .(b)]
The code above is expected to produce (identical) formulas for each value of b
. This formula is enclosed in .(.(...))
to return a list, just so that it can be properly stored in a column from the original data.table.
However, the formula returned does not match the string originally provided, but adds a comma between the +
and tt
, as you can see from the ouput:
b V1
<int> <list>
1: 1 z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + , tt + tt2
2: 2 z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + , tt + tt2
3: 3 z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + , tt + tt2
4: 4 z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + , tt + tt2
5: 5 z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + , tt + tt2
Essentially, it adds a comma where there is none. It does so even re-arranging the terms of the sum, but it stops doing it if I erase q_val
, for example. The same goes for as.formula
.
I would like to understand what is going on and avoid it.
This is just a cosmetic printing issue due to the way R treats long formulas:
If you run:
formula(paste0("z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + tt + tt2"))
You will see R will default to printing it to 2 lines, cutting it off at "tt + tt2" (no matter how wide the console is):
#z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 +
# tt + tt2
This is somewhat meaningful to the way R cosmetically shows you the formula - if you run deparse
, it will output a character vector of length 2:
deparse(formula(paste0("z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + tt + tt2")))
# [1] "z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + "
# [2] " tt + tt2"
However, assigning your original code as df_formulas
, you will see that it stores the formula as normal:
df_formulas <- dt_test[, .(.(
formula("z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + tt + tt2")
)), .(b)]
dt_formulas[[2]]
# [[1]]
# z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 +
# tt + tt2
# <environment: 0x7fa96ff6ffd8>
#
# [[2]]
# z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 +
# tt + tt2
# <environment: 0x7fa96ff6ffd8>
# ....
As you mentioned, this is also why you don't see the comma if you remove some of the variables in the formula code - it has nothing to do with what specifically you are removing, you're simply reducing the length sufficiently to avoid the automatic line break.
Maybe you want add a list
column, sth like this:
> library(data.table)
> dt_test <- data.table("a"=rnorm(10), "b"=1:5)
> dt_test[, x := list(rep(list(as.formula("z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + tt + tt2")), .N))]
> dt_test$x[[1]]
z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 +
tt + tt2
<environment: 0x56169e0ce3c8>
This looks weird during printing,
> dt_test |> head(2)
a b x
<num> <int> <list>
1: -0.5439367 1 z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + , tt + tt2
2: 0.1078461 2 z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + , tt + tt2
but actually is a formula:
> class(dt_test$x[[1]])
[1] "formula"
You might adapt that to your dynamic .(b)
stuff. Not sure if you need rep
then; data.table
doesn't like recycling, so it's needed in this example.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With