data.table and pmin with na.rm=TRUE argument

Q: What is the difference between data frame and data table in R?

frame in R is similar to the data table which is used to create tabular data but data table provides a lot more features than the data frame so, generally, all prefer the data. table instead of the data.

Q: Why is data table faster than Dplyr?

table gets faster than dplyr as the number of groups and/or rows to group by increase, including benchmarks by Matt on grouping from 10 million to 2 billion rows (100GB in RAM) on 100 - 10 million groups and varying grouping columns, which also compares pandas .

Q: What is a data table?

A data table is a range of cells in which you can change values in some of the cells and come up with different answers to a problem. A good example of a data table employs the PMT function with different loan amounts and interest rates to calculate the affordable amount on a home mortgage loan.

Tags:

r

data.table

I am trying to calculate the minimum across rows using the pmin function and data.table (similar to the post row-by-row operations and updates in data.table) but with a character list of columns using something like the with=FALSE syntax, and with the na.rm=TRUE argument.

DT <- data.table(x = c(1,1,2,3,4,1,9), 
                 y = c(2,4,1,2,5,6,6),
                 z = c(3,5,1,7,4,5,3),
                 a = c(1,3,NA,3,5,NA,2))

> DT
   x y z  a
1: 1 2 3  1
2: 1 4 5  3
3: 2 1 1 NA
4: 3 2 7  3
5: 4 5 4  5
6: 1 6 5 NA
7: 9 6 3  2

I can calculate the minimum across rows using columns directly:

DT[,min_val := pmin(x,y,z,a,na.rm=TRUE)]

giving

> DT
   x y z  a min_val
1: 1 2 3  1       1
2: 1 4 5  3       1
3: 2 1 1 NA       1
4: 3 2 7  3       2
5: 4 5 4  5       4
6: 1 6 5 NA       1
7: 9 6 3  2       2

However, I am trying to do this over an automatically generated large set of columns, and I want to be able to do this across this arbitrary list of columns, stored in a col_names variable, col_names <- c("a","y","z')

I can do this:

DT[, col_min := do.call(pmin,DT[,col_names,with=FALSE])]

But it gives me NA values. I can't figure out how to pass the na.rm=TRUE argument into the do.call. I've tried defining the function as

DT[, col_min := do.call(function(x) pmin(x,na.rm=TRUE),DT[,col_names,with=FALSE])]

but this gives me an error. I also tried passing in the argument as an additional element in a list, but I think pmin (or do.call) gets confused between the DT non-standard evaluation of column names and the argument.

Any ideas?

685

asked Mar 03 '16 17:03

Allen Wang

1 Answers

If we need to get the minimum value of each row of the whole dataset, use the pmin, on .SD concatenate the na.rm=TRUE as a list with .SD for the do.call(pmin.

DT[, col_min:= do.call(pmin, c(.SD, list(na.rm=TRUE)))]
DT
#   x y z  a col_min
#1: 1 2 3  1       1
#2: 1 4 5  3       1
#3: 2 1 1 NA       1
#4: 3 2 7  3       2
#5: 4 5 4  5       4
#6: 1 6 5 NA       1
#7: 9 6 3  2       2

If we want only to do this only for a subset of column names stored in 'col_names', use the .SDcols.

DT[, col_min:= do.call(pmin, c(.SD, list(na.rm=TRUE))), 
                .SDcols= col_names]

192

answered Oct 06 '22 10:10

akrun

Related questions
                            
                                Matching vector of default values using match.arg() with or without error [R]
                            
                                Using override.aes() in ggplot2 with layered symbols (R)
                            
                                Using data.table::setnames() when some column names might not be present
                            
                                Union and intersection of intervals
                            
                                Split one row into multiple rows [duplicate]
                            
                                Quotes and hyphens not removed by tm package functions while cleaning corpus
                            
                                Reconstruct symmetric matrix from values in long-form
                            
                                Extract last word in a string after comma if there are multiple words else the first word
                            
                                ggplot2: add conditional density curves describing both dimensions of scatterplot
                            
                                unable to install R ggmap package: compilation failed for package ‘jpeg’
                            
                                swimmer survival plot
                            
                                R: Pass data.frame by reference to a function
                            
                                Rstudio-server unable to connect to service
                            
                                How to visualize pairwise comparisons with `ggplot2`?
                            
                                Different pages in Shiny App
                            
                                Dynamic selectInput in R shiny
                            
                                Split character string multiple times every two characters
                            
                                How to use the for loop with function needing for a string field?
                            
                                Error: nrow(x) == n is not TRUE when using Train in Caret
                            
                                R caret: Maximizing sensitivity for manually defined positive class for training (classification),

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With