Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

use replace_na conditionally

I want to conditionally replace missing revenue up to 16th July 2017 with zero using tidyverse.

My Data

library(tidyverse)
library(lubridate)

    df<- tribble(
                 ~Date, ~Revenue,
          "2017-07-01",      500,
          "2017-07-02",      501,
          "2017-07-03",      502,
          "2017-07-04",      503,
          "2017-07-05",      504,
          "2017-07-06",      505,
          "2017-07-07",      506,
          "2017-07-08",      507,
          "2017-07-09",      508,
          "2017-07-10",      509,
          "2017-07-11",      510,
          "2017-07-12",      NA,
          "2017-07-13",      NA,
          "2017-07-14",      NA,
          "2017-07-15",      NA,
          "2017-07-16",      NA,
          "2017-07-17",      NA,
          "2017-07-18",      NA,
          "2017-07-19",      NA,
          "2017-07-20",      NA
          )

df$Date <- ymd(df$Date)

Date up to which I want to conditionally replace NAs

max.date <- ymd("2017-07-16")

Output I desire

    # A tibble: 20 × 2
             Date Revenue
            <chr>   <dbl>
    1  2017-07-01     500
    2  2017-07-02     501
    3  2017-07-03     502
    4  2017-07-04     503
    5  2017-07-05     504
    6  2017-07-06     505
    7  2017-07-07     506
    8  2017-07-08     507
    9  2017-07-09     508
    10 2017-07-10     509
    11 2017-07-11     510
    12 2017-07-12       0
    13 2017-07-13       0
    14 2017-07-14       0
    15 2017-07-15       0
    16 2017-07-16       0
    17 2017-07-17      NA
    18 2017-07-18      NA
    19 2017-07-19      NA
    20 2017-07-20      NA

The only way I could work this out was to split the df into several parts, update for NAs and then rbind the whole lot.

Could someone please help me do this efficiently using tidyverse.

like image 218
cephalopod Avatar asked Jan 31 '23 00:01

cephalopod


1 Answers

We can mutate the 'Revenue' column to replace the NA with 0 using a logical condition that checks whether the element is NA and the 'Date' is less than or equal to 'max.date'

df %>% 
  mutate(Revenue = replace(Revenue, is.na(Revenue) & Date <= max.date, 0))
# A tibble: 20 x 2
#         Date Revenue
#       <date>   <dbl>
# 1 2017-07-01     500
# 2 2017-07-02     501
# 3 2017-07-03     502
# 4 2017-07-04     503
# 5 2017-07-05     504
# 6 2017-07-06     505
# 7 2017-07-07     506
# 8 2017-07-08     507
# 9 2017-07-09     508
#10 2017-07-10     509
#11 2017-07-11     510
#12 2017-07-12       0
#13 2017-07-13       0
#14 2017-07-14       0
#15 2017-07-15       0
#16 2017-07-16       0
#17 2017-07-17      NA
#18 2017-07-18      NA
#19 2017-07-19      NA
#20 2017-07-20      NA

It can be achieved with data.table by specifying the logical condition in 'i and assigning (:=) the 'Revenue' to 0

library(data.table)
setDT(df)[is.na(Revenue) & Date <= max.date, Revenue := 0]

Or with base R

df$Revenue[is.na(df$Revenue) & df$Date <= max.date] <- 0
like image 196
akrun Avatar answered Feb 03 '23 03:02

akrun