Replacing missing values

Tags:

Let's say I have a dataframe containing the sales for some quarters, while the values for the following quarters are missing. I would like to replace the NAs by a simple formula (with mutate/dplyr like below). The issue is that I don't want to use mutate so many times. How could I do that for all NAs at the same time? Is there a way?

structure(list(Period = c("1999Q1", "1999Q2", "1999Q3", "1999Q4", 
"2000Q1", "2000Q2", "2000Q3", "2000Q4", "2001Q1", "2001Q2", "2001Q3", 
"2001Q4", "2002Q1", "2002Q2", "2002Q3", "2002Q4", "2003Q1", "2003Q2", 
"2003Q3", "2003Q4"), Sales= c(353.2925571, 425.9299841, 357.5204626, 
363.80247, 302.8081066, 394.328576, 435.15573, 387.99768, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-20L))

test %>%
      mutate(Sales = ifelse(is.na(Sales), 1.05*lag(Sales, 4), Sales)) %>%
      mutate(Sales = ifelse(is.na(Sales), 1.05*lag(Sales, 4), Sales)) %>%
      mutate(Sales = ifelse(is.na(Sales), 1.05*lag(Sales, 4), Sales))

900

asked Aug 31 '19 06:08

AlexB

Video Answer

2 Answers

One dplyr and tidyr possibility could be:

df %>%
 group_by(quarter = substr(Period, 5, 6)) %>%
 mutate(Sales_temp = replace_na(Sales, last(na.omit(Sales)))) %>%
 group_by(quarter, na = is.na(Sales)) %>%
 mutate(constant = 1.05,
        Sales_temp = Sales_temp * cumprod(constant),
        Sales = coalesce(Sales, Sales_temp)) %>%
 ungroup() %>%
 select(1:2)

   Period Sales
   <chr>  <dbl>
 1 1999Q1  353.
 2 1999Q2  426.
 3 1999Q3  358.
 4 1999Q4  364.
 5 2000Q1  303.
 6 2000Q2  394.
 7 2000Q3  435.
 8 2000Q4  388.
 9 2001Q1  318.
10 2001Q2  414.
11 2001Q3  457.
12 2001Q4  407.
13 2002Q1  334.
14 2002Q2  435.
15 2002Q3  480.
16 2002Q4  428.
17 2003Q1  351.
18 2003Q2  456.
19 2003Q3  504.
20 2003Q4  449.

Or with just dplyr:

df %>%
 group_by(quarter = substr(Period, 5, 6)) %>%
 mutate(Sales_temp = if_else(is.na(Sales), last(na.omit(Sales)), Sales)) %>%
 group_by(quarter, na = is.na(Sales)) %>%
 mutate(constant = 1.05,
        Sales_temp = Sales_temp * cumprod(constant),
        Sales = coalesce(Sales, Sales_temp)) %>%
 ungroup() %>%
 select(1:2)

answered Oct 01 '22 20:10

tmfmnk

x <- test$Sales

# find that last non-NA data
last.valid <- tail(which(!is.na(x)),1)

# store the "base"
base <- ceiling(last.valid/4)*4 + (-3:0)
base <- base + ifelse(base > last.valid, -4, 0)
base <- x[base]


# calculate the "exponents"
expos <- ceiling( ( seq(length(x)) - last.valid ) / 4 )

test$Sales <- ifelse(is.na(x), bases * 1.05 ^ expos, x)

tail(test)

#    Period    Sales
# 15 2002Q3 479.7592
# 16 2002Q4 427.7674
# 17 2003Q1 350.5382
# 18 2003Q2 456.4846
# 19 2003Q3 503.7472
# 20 2003Q4 449.1558

answered Oct 01 '22 22:10

bluk

Related questions
                            
                                R shinydashboard: specifying div style width argument as percentage to fit a resizeable JS plot
                            
                                For R: How to exclude some data files based on file language
                            
                                gganimate round values during transition
                            
                                Sankey Diagram in R with networkD3 - row number issues
                            
                                How to bind two lists with same structure?
                            
                                Efficient way to subset data.table based on value in any of selected columns [duplicate]
                            
                                R Shiny - multi-page editable DataTable jumps to row #1 after an edit
                            
                                Regular expression in Rstudio's "Find in files"
                            
                                Saving S4 objects in a list of list
                            
                                Install R packages from the command line
                            
                                How to identify all sequential numbers not covered by 'to' and 'from' positions?
                            
                                NoSuchMethodError when using Scala in R with rscala
                            
                                How to combine similar strings showing most common characters
                            
                                How to install R 3.4.4 in alpine
                            
                                Is there a way around casting large integers as string when querying data from BigQuery through R?
                            
                                Fast sorted sample with replacement
                            
                                Not enough space Error appears when running for loop for 13K pdf documents
                            
                                Shiny DT: Freeze rownames while sorting?
                            
                                Rmarkdown - Use table name as variable in dynamic sql chunk?
                            
                                Shiny selectInput depending on validated reactive does not pass on validation error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With