Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using columns in lapply with data.table

Tags:

r

data.table

I have a data.table with several columns (can be a dynamic number of columns) that I am wanting to apply minimum values to. Each column gets its own specific minimum value.

A basic version of my data.table looks like:

library(data.table)
dt <- data.table(
  urn = 1:10,
  col1 = 5:14,
  col2 = 11:20
)

Note: I've only included two columns with values that need to have minimums for the reprex. There could be more columns, hence trying to code this a little bit dynamically.

My approach to this has been nested lapply statements and so forth, but I can't figure out how to call my column within my lapply function.

min_vals <- c(10, 15)
my_cols <- grep("^col", colnames(dt), value = TRUE)

## First approach
dt[,
  (my_cols) := lapply(min_vals, function(x) {
    my_cols[my_cols < x] <- x
  })]

## Second Approach
dt[,
  (my_cols) := lapply(min_vals, function(x) {
    ifelse(my_cols < x, x, my_cols)
  })]

At the moment the first approach sets all rows to the respective min_vals, rather than just the values below the min_vals. The second approach doesn't work at all, but instead errors.

The desired output looks like the following:

dt
#       urn  col1  col2
#     <int> <int> <int>
#  1:     1    10    15
#  2:     2    10    15
#  3:     3    10    15
#  4:     4    10    15
#  5:     5    10    15
#  6:     6    10    16
#  7:     7    11    17
#  8:     8    12    18
#  9:     9    13    19
# 10:    10    14    20

It would be great to have a solution that allows for a dynamic numner of columns that match the col<number> format.

like image 307
Dan Avatar asked Apr 19 '26 14:04

Dan


2 Answers

We could use Map instead of lapply and, as @Roland in the comments points out, i) loop over .SD as first input (specifying .SDcols with mycols) and use ii) the faster fielse instead of ifelse.

library(data.table)

dt[, (my_cols) := Map(\(x, m) fifelse(x < m, m, x),
                      .SD,
                      min_vals),
   .SDcols = my_cols]

dt
#>     urn col1 col2
#>  1:   1   10   15
#>  2:   2   10   15
#>  3:   3   10   15
#>  4:   4   10   15
#>  5:   5   10   15
#>  6:   6   10   16
#>  7:   7   11   17
#>  8:   8   12   18
#>  9:   9   13   19
#> 10:  10   14   20

@Mwavu suggests another great option using pmax instead of fifelse in the comments:

dt[, c(my_cols) := Map(\(x, y) pmax(x, y), .SD, min_vals), .SDcols = my_cols]
#> same result as above

Data from OP

dt <- data.table(
  urn = 1:10,
  col1 = 5:14,
  col2 = 11:20
)

min_vals <- c(10, 15)
my_cols <- grep("^col", colnames(dt), value = TRUE)

Created on 2023-03-20 with reprex v2.0.2

like image 97
TimTeaFan Avatar answered Apr 21 '26 05:04

TimTeaFan


Here you're actually trying to loop through two lists/vectors at once which lapply isn't designed to do. So here mapply would be useful:

library("data.table")

dt <- data.table(urn = 1:10,
                 col1 = 5:14,
                 col2 = 11:20)

min_vals <- c(10, 15)
my_cols <- grep("^col", colnames(dt), value = TRUE)

dt[,
   (my_cols) := mapply(function(x, y) {
     ifelse(x < y, y, x)
   },
   x = .SD,
   y = min_vals,
   SIMPLIFY = FALSE), .SDcols = my_cols]

To make sure you apply your function only to the columns you want, you also need to tell it to subset the data to only keep the columns you want (.SD is the subset of your data containing the columns in my_cols).

like image 20
rps1227 Avatar answered Apr 21 '26 06:04

rps1227