Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add only missing Dates in Dataframe

Tags:

dataframe

r

dplyr

I have below mentioned data frame:

Date        Val1     Val2
2018-04-01  125      0.05
2018-04-03  458      2.99
2018-04-05  354      1.25

I want to add only missing dates considering Sys.Date() (Here for example Sys.Date() is 2018-04-06) in dataframe with corresponding val1 and val2 as 0.

I have tried:

t2<-merge(data.frame(Date= seq(min(ymd(t1$Date)), max(ymd(date)), by = "days")), t1, by = "Date", all = TRUE)

Required Dataframe:

Date        Val1     Val2
2018-04-01  125      0.05
2018-04-02  0        0
2018-04-03  458      2.99
2018-04-04  0        0
2018-04-05  354      1.25
2018-04-06  0        0
like image 368
Roy1245 Avatar asked May 05 '18 17:05

Roy1245


People also ask

How do I add a missing date to a DataFrame?

To add missing dates to Python Pandas DataFrame, we can use the DatetimeIndex instance's reindex method. We create a date range index with idx = pd. date_range('09-01-2020', '09-30-2020') .


3 Answers

This could be done with complete

library(tidyverse)
df1 %>%
    complete(Date = seq(Date[1], Sys.Date(), by = "1 day"),
                fill = list(Val1 = 0, Val2 = 0))

If we need to pass multiple variables for the fill, create the list of columns that we need to fill

nm1 <- setdiff(names(df1), "Date") #in this example excluding the Date
nm2 <- setNames(as.list(rep(0, length(nm1))), nm1)

and then pass that as argument for the fill

df1 %>% 
     complete(Date = seq(Date[1], Sys.Date(), by = "1 day"), fill = nm2)
# A tibble: 35 x 3
#   Date        Val1  Val2
#   <date>     <dbl> <dbl>
# 1 2018-04-01   125  0.05
# 2 2018-04-02     0  0   
# 3 2018-04-03   458  2.99
# 4 2018-04-04     0  0   
# 5 2018-04-05   354  1.25
# 6 2018-04-06     0  0   
# 7 2018-04-07     0  0   
# 8 2018-04-08     0  0   
# 9 2018-04-09     0  0   
#10 2018-04-10     0  0   
# ... with 25 more rows
like image 76
akrun Avatar answered Oct 22 '22 08:10

akrun


You could use padr. padr is made for filling in missing date values. First you add the missing dates based on the interval, and if you do not want NA's you fill them with a value (or function of most occuring value)

edit: added end_val to include the run until sys.Date()

library(padr)
# Specify end_val to go all the way to sys.Date and add 1 to include sys.Date
padded_df <- pad(df, interval = "day", end_val = Sys.Date()+1)
padded_df <- fill_by_value(padded_df, value = 0)
padded_df

        Date Val1 Val2
1 2018-04-01  125 0.05
2 2018-04-02    0 0.00
3 2018-04-03  458 2.99
4 2018-04-04    0 0.00
5 2018-04-05  354 1.25
.....

31 2018-05-01    0    0
32 2018-05-02    0    0
33 2018-05-03    0    0
34 2018-05-04    0    0
35 2018-05-05    0    0
36 2018-05-06    0    0
like image 31
phiver Avatar answered Oct 22 '22 06:10

phiver


Here's a correction of your approach, in base R.

Replace max(t1$Date) bySys.Date() in your real application:

t2<-merge(data.frame(Date= as.Date(min(t1$Date):max(t1$Date),"1970-1-1")),
          t1, by = "Date", all = TRUE)
t2[is.na(t2)] <- 0

#         Date Val1 Val2
# 1 2018-04-01  125 0.05
# 2 2018-04-02    0 0.00
# 3 2018-04-03  458 2.99
# 4 2018-04-04    0 0.00
# 5 2018-04-05  354 1.25

data

t1 <- read.table(text="Date        Val1     Val2
'2018-04-01'  125 0.05
'2018-04-03'  458 2.99
'2018-04-05'  354 1.25",h=T,strin=F)
t1$Date <- as.Date(df$Date)
like image 2
Moody_Mudskipper Avatar answered Oct 22 '22 07:10

Moody_Mudskipper