I'd like to apply a couple functions to a column but I want to apply some logic as to when I do this, in this case when another column has some NA's. To illustrate I'll add some NA to the iris
dataset and turn it into a data.table:
library(data.table)
irisdt <- iris
## Prep some example data
irisdt[irisdt$Sepal.Length < 5,]$Sepal.Length <- NA
irisdt[irisdt$Sepal.Width < 3,]$Sepal.Width <- NA
## Turn this into a data.table
irisdt <- as.data.table(iris)
If I wanted to apply max
to multiple columns I'd go like this:
## Apply a function to individual columns
irisdt[, lapply(.SD, max), .SDcols = c("Petal.Length", "Petal.Width")]
#> Petal.Length Petal.Width
#> 1: 6.9 2.5
In this case however I'd like to take out any row that isn't an NA in Sepal.Length
and then return max and min along with the name of the column I subset for NA's. Below is an ugly way of implementing this but hopefully illustrates what I am after:
## Here is what the table would look like
desired_table <- rbind(
irisdt[!is.na(Sepal.Length), .(max = max(Petal.Length), min = min(Petal.Length), var = "Sepal.Length")],
irisdt[!is.na(Sepal.Width), .(max = max(Petal.Length), min = min(Petal.Length), var = "Sepal.Width")]
)
desired_table
#> max min var
#> 1: 6.9 1.2 Sepal.Length
#> 2: 6.7 1.0 Sepal.Width
Created on 2020-01-14 by the reprex package (v0.3.0)
Any thoughts on how I might accomplish this?
Apply any function to all R data frame You can set the MARGIN argument to c(1, 2) or, equivalently, to 1:2 to apply the function to each value of the data frame. If you set MARGIN = c(2, 1) instead of c(1, 2) the output will be the same matrix but transposed.
SD stands for "Subset of Data. table". The dot before SD has no significance but doesn't let it clash with a user-defined column name.
melt
may be better option if we are comparing by multiple columns. Reshape into 'long' format, then use i
with the condition !is.na(value)
, while grouping by 'variable' and get the min
and max
of the specified variable
library(data.table)
melt(irisdt, measure = c('Sepal.Length', 'Sepal.Width'))[!is.na(value),
.(max = max(Petal.Length), min = min(Petal.Length)), .(variable)]
If we are doing this for multiple variables, then use the lapply(.SD, ...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With