Some time ago they introduced a nice SQL-like alternative to <code>ifelse</code> within <code>dplyr</code>, i.e. <code>case_when</code>. Is there an equivalent in <code>data.table</code> that would allow you to specify different conditions within one <code>[]</code> statement, without loading additional packages? Example: <pre class="prettyprint"><code>library(dplyr) df <- data.frame(a = c("a", "b", "a"), b = c("b", "a", "a")) df <- df %>% mutate( new = case_when( a == "a" & b == "b" ~ "c", a == "b" & b == "a" ~ "d", TRUE ~ "e") ) a b new 1 a b c 2 b a d 3 a a e </code></pre> It would certainly be very helpful and make code much more readable (one of the reasons why I keep using <code>dplyr</code> in these cases).

1) If the conditions are mutually exclusive with a default if all conditions are false then this works: <pre class="prettyprint"><code>library(data.table) DT <- as.data.table(df) # df is from question DT[, new := c("e", "c", "d")[1 + 1 * (a == "a" & b == "b") + 2 * (a == "b" & b == "a")] ] </code></pre> giving: <pre class="prettyprint"><code>> DT a b new 1: a b c 2: b a d 3: a a e </code></pre> 2) If the results of the conditions are numeric then it is even easier. For example suppose instead of <code>c</code> and <code>d</code> we want 10 and 17 with a default of 3. Then: <pre class="prettyprint"><code>library(data.table) DT <- as.data.table(df) # df is from question DT[, new := 3 + (10 - 3) * (a == "a" & b == "b") + (17 - 3) * (a == "b" & b == "a")] </code></pre> 3) Note that adding a 1-liner is sufficient to implement this. It assumes that there is at least one TRUE leg for each row. <pre class="prettyprint"><code>when <- function(...) names(match.call()[-1])[apply(cbind(...), 1, which.max)] # test DT[, new := when(c = a == 'a' & b == 'b', d = a == 'b' & b == 'a', e = TRUE)] </code></pre>

This is not really an answer, but a bit too long for a comment. If deemed inappropriate I'm happy to remove the post. There exists an interesting post on RStudio Community that discusses options to use <code>dplyr::case_when</code> without the usual <code>tidyverse</code> dependencies. To summarise, three alternatives seem to exist: <ol> <li> Stefan Fleck isolated <code>case_when</code> from <code>dplyr</code> and build a new package <code>lest</code> that depends only on <code>base</code>.</li> <li> yonicd developed <code>noplyr</code>, which "provides basic <code>dplyr</code> and <code>tidyr</code> functionality without the tidyverse dependencies".</li> <li> Bob Rudis (hrbrmstr) is the creator of <code>freebase</code>, a "A 'usethis'-like Package for Base R Pseudo-equivalents of 'tidyverse' Code", which might also be worth checking out. </li> </ol> If it is only <code>case_when</code> that you're after, I imagine <code>lest</code> might be an attractive & minimal option in combination with <code>data.table</code>. <hr> <h3>Update [29 October 2019]</h3> Tyson Barrett recently made the package <code>tidyfast</code> available (currently as version <code>0.1.0</code>) on GitHub, which provides function "<code>dt_case_when</code> for <code>dplyr::case_when()</code> syntax with the speed of <code>data.table::fifelse()</code>". <h3>Update [25 February 2020]</h3> There is also dtplyr, authored by Lionel Henry and maintained by Hadley Wickham, which "provides a <code>data.table</code> backend for <code>dplyr</code>. The goal of <code>dtplyr</code> is to allow you to write <code>dplyr</code> code that is automatically translated to the equivalent, but usually much faster, <code>data.table</code> code.".

Here is a variation on @g-grothendieck's answer that works for non exclusive conditions : <pre class="prettyprint"><code>DT[, new := c("c", "d", "e")[ apply(cbind( a == "a" & b == "b", a == "b" & b == "a", TRUE), 1, which.max)] ] DT # a b new # 1: a b c # 2: b a d # 3: a a e </code></pre>

data.table alternative for dplyr case_when

Tags:

r

if-statement

data.table

dplyr

Some time ago they introduced a nice SQL-like alternative to ifelse within dplyr, i.e. case_when.

Is there an equivalent in data.table that would allow you to specify different conditions within one [] statement, without loading additional packages?

Example:

library(dplyr)

df <- data.frame(a = c("a", "b", "a"), b = c("b", "a", "a"))

df <- df %>% mutate(
    new = case_when(
    a == "a" & b == "b" ~ "c",
    a == "b" & b == "a" ~ "d",
    TRUE ~ "e")
    )

  a b new
1 a b   c
2 b a   d
3 a a   e

It would certainly be very helpful and make code much more readable (one of the reasons why I keep using dplyr in these cases).

899

asked Oct 28 '18 11:10

arg0naut91

4 Answers

FYI, a more recent answer for those coming across this post 2019. data.table versions above 1.13.0 have the fcase function that can be used. Note that it is not a drop-in replacement for dplyr::case_when as the syntax is different, but will be a "native" data.table way of calculation.

# Lazy evaluation
x = 1:10
data.table::fcase(
    x < 5L, 1L,
    x >= 5L, 3L,
    x == 5L, stop("provided value is an unexpected one!")
)
# [1] 1 1 1 1 3 3 3 3 3 3

dplyr::case_when(
    x < 5L ~ 1L,
    x >= 5L ~ 3L,
    x == 5L ~ stop("provided value is an unexpected one!")
)
# Error in eval_tidy(pair$rhs, env = default_env) :
#  provided value is an unexpected one!

# Benchmark
x = sample(1:100, 3e7, replace = TRUE) # 114 MB
microbenchmark::microbenchmark(
dplyr::case_when(
  x < 10L ~ 0L,
  x < 20L ~ 10L,
  x < 30L ~ 20L,
  x < 40L ~ 30L,
  x < 50L ~ 40L,
  x < 60L ~ 50L,
  x > 60L ~ 60L
),
data.table::fcase(
  x < 10L, 0L,
  x < 20L, 10L,
  x < 30L, 20L,
  x < 40L, 30L,
  x < 50L, 40L,
  x < 60L, 50L,
  x > 60L, 60L
),
times = 5L,
unit = "s")
# Unit: seconds
#               expr   min    lq  mean   median    uq    max neval
# dplyr::case_when   11.57 11.71 12.22    11.82 12.00  14.02     5
# data.table::fcase   1.49  1.55  1.67     1.71  1.73   1.86     5

Source, data.table NEWS for 1.13.0, released (24 Jul 2020).

answered Oct 22 '22 23:10

skedaddle_waznook

1) If the conditions are mutually exclusive with a default if all conditions are false then this works:

library(data.table)
DT <- as.data.table(df) # df is from question

DT[, new := c("e", "c", "d")[1 +
                             1 * (a == "a" & b == "b") + 
                             2 * (a == "b" & b == "a")]
]

giving:

> DT
   a b new
1: a b   c
2: b a   d
3: a a   e

2) If the results of the conditions are numeric then it is even easier. For example suppose instead of c and d we want 10 and 17 with a default of 3. Then:

library(data.table)
DT <- as.data.table(df) # df is from question

DT[, new := 3 + 
            (10 - 3) * (a == "a" & b == "b") + 
            (17 - 3) * (a == "b" & b == "a")]

3) Note that adding a 1-liner is sufficient to implement this. It assumes that there is at least one TRUE leg for each row.

when <- function(...) names(match.call()[-1])[apply(cbind(...), 1, which.max)]

# test
DT[, new := when(c = a == 'a' & b == 'b', 
                 d = a == 'b' & b == 'a', 
                 e = TRUE)]

answered Oct 22 '22 23:10

G. Grothendieck

This is not really an answer, but a bit too long for a comment. If deemed inappropriate I'm happy to remove the post.

There exists an interesting post on RStudio Community that discusses options to use dplyr::case_when without the usual tidyverse dependencies.

To summarise, three alternatives seem to exist:

Stefan Fleck isolated case_when from dplyr and build a new package lest that depends only on base.
yonicd developed noplyr, which "provides basic dplyr and tidyr functionality without the tidyverse dependencies".
Bob Rudis (hrbrmstr) is the creator of freebase, a "A 'usethis'-like Package for Base R Pseudo-equivalents of 'tidyverse' Code", which might also be worth checking out.

If it is only case_when that you're after, I imagine lest might be an attractive & minimal option in combination with data.table.

Update [29 October 2019]

Tyson Barrett recently made the package tidyfast available (currently as version 0.1.0) on GitHub, which provides function "dt_case_when for dplyr::case_when() syntax with the speed of data.table::fifelse()".

Update [25 February 2020]

There is also dtplyr, authored by Lionel Henry and maintained by Hadley Wickham, which "provides a data.table backend for dplyr. The goal of dtplyr is to allow you to write dplyr code that is automatically translated to the equivalent, but usually much faster, data.table code.".

answered Oct 22 '22 23:10

Maurits Evers

Here is a variation on @g-grothendieck's answer that works for non exclusive conditions :

DT[, new := c("c", "d", "e")[
  apply(cbind(
    a == "a" & b == "b", 
    a == "b" & b == "a",
    TRUE), 1, which.max)]
  ]

DT
#    a b new
# 1: a b   c
# 2: b a   d
# 3: a a   e

answered Oct 22 '22 23:10

Moody_Mudskipper

Related questions
                            
                                How do I generate a histogram for each column of my table?
                            
                                Add missing value in column with value from row above
                            
                                Joining aggregated values back to the original data frame [duplicate]
                            
                                How to fill NAs with LOCF by factors in data frame, split by country
                            
                                Difference between the == and %in% operators in R [duplicate]
                            
                                How to find the difference in value in every two consecutive rows in R?
                            
                                Fill in data frame with values from rows above
                            
                                dplyr if_else() vs base R ifelse()
                            
                                Filter values from list in R
                            
                                How do I use the lubridate package to calculate the number of months between two date vectors where one of the vectors has NA values?
                            
                                Deleting every n-th row in a dataframe
                            
                                What's wrong with my function to load multiple .csv files into single dataframe in R using rbind?
                            
                                How to setup environment variable R_user to use rpy2 in python
                            
                                Union of intersecting vectors in a list in R
                            
                                Remove NA/NaN/Inf in a matrix
                            
                                Format axis tick labels to percentage in plotly
                            
                                How do I add a prefix to several variable names using dplyr?
                            
                                How to obtain RMSE out of lm result?
                            
                                Exclude Blank and NA in R [duplicate]
                            
                                Error message installing Cairo package in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

data.table alternative for dplyr case_when

Tags:

r

if-statement

data.table

dplyr

arg0naut91

People also ask

4 Answers

skedaddle_waznook

G. Grothendieck

Update [29 October 2019]

Update [25 February 2020]

Maurits Evers

Moody_Mudskipper

Recent Activity

Donate For Us