I have below mentioned data frame:
Date Val1 Val2
2018-04-01 125 0.05
2018-04-03 458 2.99
2018-04-05 354 1.25
I want to add only missing dates considering Sys.Date() (Here for example Sys.Date() is 2018-04-06) in dataframe with corresponding val1 and val2 as 0.
I have tried:
t2<-merge(data.frame(Date= seq(min(ymd(t1$Date)), max(ymd(date)), by = "days")), t1, by = "Date", all = TRUE)
Required Dataframe:
Date Val1 Val2
2018-04-01 125 0.05
2018-04-02 0 0
2018-04-03 458 2.99
2018-04-04 0 0
2018-04-05 354 1.25
2018-04-06 0 0
To add missing dates to Python Pandas DataFrame, we can use the DatetimeIndex instance's reindex method. We create a date range index with idx = pd. date_range('09-01-2020', '09-30-2020') .
This could be done with complete
library(tidyverse)
df1 %>%
complete(Date = seq(Date[1], Sys.Date(), by = "1 day"),
fill = list(Val1 = 0, Val2 = 0))
If we need to pass multiple variables for the fill, create the list of columns that we need to fill
nm1 <- setdiff(names(df1), "Date") #in this example excluding the Date
nm2 <- setNames(as.list(rep(0, length(nm1))), nm1)
and then pass that as argument for the fill
df1 %>%
complete(Date = seq(Date[1], Sys.Date(), by = "1 day"), fill = nm2)
# A tibble: 35 x 3
# Date Val1 Val2
# <date> <dbl> <dbl>
# 1 2018-04-01 125 0.05
# 2 2018-04-02 0 0
# 3 2018-04-03 458 2.99
# 4 2018-04-04 0 0
# 5 2018-04-05 354 1.25
# 6 2018-04-06 0 0
# 7 2018-04-07 0 0
# 8 2018-04-08 0 0
# 9 2018-04-09 0 0
#10 2018-04-10 0 0
# ... with 25 more rows
You could use padr. padr is made for filling in missing date values.
First you add the missing dates based on the interval, and if you do not want NA's you fill them with a value (or function of most occuring value)
edit: added end_val to include the run until sys.Date()
library(padr)
# Specify end_val to go all the way to sys.Date and add 1 to include sys.Date
padded_df <- pad(df, interval = "day", end_val = Sys.Date()+1)
padded_df <- fill_by_value(padded_df, value = 0)
padded_df
Date Val1 Val2
1 2018-04-01 125 0.05
2 2018-04-02 0 0.00
3 2018-04-03 458 2.99
4 2018-04-04 0 0.00
5 2018-04-05 354 1.25
.....
31 2018-05-01 0 0
32 2018-05-02 0 0
33 2018-05-03 0 0
34 2018-05-04 0 0
35 2018-05-05 0 0
36 2018-05-06 0 0
Here's a correction of your approach, in base R.
Replace max(t1$Date) bySys.Date() in your real application:
t2<-merge(data.frame(Date= as.Date(min(t1$Date):max(t1$Date),"1970-1-1")),
t1, by = "Date", all = TRUE)
t2[is.na(t2)] <- 0
# Date Val1 Val2
# 1 2018-04-01 125 0.05
# 2 2018-04-02 0 0.00
# 3 2018-04-03 458 2.99
# 4 2018-04-04 0 0.00
# 5 2018-04-05 354 1.25
data
t1 <- read.table(text="Date Val1 Val2
'2018-04-01' 125 0.05
'2018-04-03' 458 2.99
'2018-04-05' 354 1.25",h=T,strin=F)
t1$Date <- as.Date(df$Date)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With