I'd like to apply a couple functions to a column but I want to apply some logic as to when I do this, in this case when another column has some NA's. To illustrate I'll add some NA to the <code>iris</code> dataset and turn it into a data.table: <pre class="prettyprint lang-r prettyprint-override"><code>library(data.table) irisdt <- iris ## Prep some example data irisdt[irisdt$Sepal.Length < 5,]$Sepal.Length <- NA irisdt[irisdt$Sepal.Width < 3,]$Sepal.Width <- NA ## Turn this into a data.table irisdt <- as.data.table(iris) </code></pre> If I wanted to apply <code>max</code> to multiple columns I'd go like this: <pre class="prettyprint"><code>## Apply a function to individual columns irisdt[, lapply(.SD, max), .SDcols = c("Petal.Length", "Petal.Width")] #> Petal.Length Petal.Width #> 1: 6.9 2.5 </code></pre> In this case however I'd like to take out any row that isn't an NA in <code>Sepal.Length</code> and then return max and min along with the name of the column I subset for NA's. Below is an ugly way of implementing this but hopefully illustrates what I am after: <pre class="prettyprint"><code>## Here is what the table would look like desired_table <- rbind( irisdt[!is.na(Sepal.Length), .(max = max(Petal.Length), min = min(Petal.Length), var = "Sepal.Length")], irisdt[!is.na(Sepal.Width), .(max = max(Petal.Length), min = min(Petal.Length), var = "Sepal.Width")] ) desired_table #> max min var #> 1: 6.9 1.2 Sepal.Length #> 2: 6.7 1.0 Sepal.Width </code></pre> Created on 2020-01-14 by the reprex package (v0.3.0) Any thoughts on how I might accomplish this?

<code>melt</code> may be better option if we are comparing by multiple columns. Reshape into 'long' format, then use <code>i</code> with the condition <code>!is.na(value)</code>, while grouping by 'variable' and get the <code>min</code> and <code>max</code> of the specified variable <pre class="prettyprint"><code>library(data.table) melt(irisdt, measure = c('Sepal.Length', 'Sepal.Width'))[!is.na(value), .(max = max(Petal.Length), min = min(Petal.Length)), .(variable)] </code></pre> If we are doing this for multiple variables, then use the <code>lapply(.SD, ...</code>

Apply a function to each column with a condition in data.table [R]

Tags:

r

data.table

I'd like to apply a couple functions to a column but I want to apply some logic as to when I do this, in this case when another column has some NA's. To illustrate I'll add some NA to the iris dataset and turn it into a data.table:

library(data.table)

irisdt <- iris
## Prep some example data
irisdt[irisdt$Sepal.Length < 5,]$Sepal.Length <- NA
irisdt[irisdt$Sepal.Width < 3,]$Sepal.Width <- NA

## Turn this into a data.table
irisdt <- as.data.table(iris)

If I wanted to apply max to multiple columns I'd go like this:

## Apply a function to individual columns
irisdt[, lapply(.SD, max), .SDcols = c("Petal.Length", "Petal.Width")]
#>    Petal.Length Petal.Width
#> 1:          6.9         2.5

In this case however I'd like to take out any row that isn't an NA in Sepal.Length and then return max and min along with the name of the column I subset for NA's. Below is an ugly way of implementing this but hopefully illustrates what I am after:

## Here is what the table would look like
desired_table <- rbind(
  irisdt[!is.na(Sepal.Length), .(max = max(Petal.Length), min = min(Petal.Length), var = "Sepal.Length")],
  irisdt[!is.na(Sepal.Width), .(max = max(Petal.Length), min = min(Petal.Length), var = "Sepal.Width")]
)

desired_table
#>    max min          var
#> 1: 6.9 1.2 Sepal.Length
#> 2: 6.7 1.0  Sepal.Width

^{Created on 2020-01-14 by the reprex package (v0.3.0)}

Any thoughts on how I might accomplish this?

733

asked Jan 14 '20 23:01

boshek

1 Answers

melt may be better option if we are comparing by multiple columns. Reshape into 'long' format, then use i with the condition !is.na(value), while grouping by 'variable' and get the min and max of the specified variable

library(data.table)
melt(irisdt,  measure = c('Sepal.Length', 'Sepal.Width'))[!is.na(value),
   .(max = max(Petal.Length), min = min(Petal.Length)), .(variable)]

If we are doing this for multiple variables, then use the lapply(.SD, ...

answered Nov 14 '22 20:11

akrun

Related questions
                            
                                what does %*%+ mean in matrix operations?
                            
                                How to list all S3 methods defined in a specific package / namespace for a particular generic function
                            
                                Add multiple output variables using purrr and a predefined function
                            
                                Find initial conditions for nonlinear models using the nlsLM function
                            
                                Why does tensorflow/keras choke when I try to fit multiple models in parallel?
                            
                                Cross-references in RMarkdown for Word documents
                            
                                Using R-markdown knitr hooks to custom format tables in HTML reports
                            
                                Partial Row Labels Heatmap - R
                            
                                covr shows 0% of coverage while all tests with testthat pass
                            
                                Extract filters from R Shiny Datatable
                            
                                Convert a "loadings" object to a dataframe (R)
                            
                                R lubridate ymd_hms millisecond diff
                            
                                Split & extract part of string (between a "." and digit) in R
                            
                                Collapse error message into a single block when the error message is modified to print in red
                            
                                R: fast (conditional) subsetting where feasible
                            
                                How to optimize intersect of rows and columns in a matrix?
                            
                                How to change font size for all text in a ggplot object relative to current value?
                            
                                data shuffling by sample() decreases RMSE to lower value in testingset than trainingset
                            
                                R: Find object by name in deeply nested list
                            
                                How To Edit Common Legend Title In ggarrange?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With