I have a data set that looks like this:
shop_id,item_id,time,value
150,1,2015-07-10,3
150,1,2015-07-11,5
150,1,2015-07-13,2
150,2,2015-07-10,15
150,2,2015-07-12,12
Within each group, defined by "shop_id and "item_id", there are missing dates.
I wish to expand this irregular the time series to a regular, with consecutive dates, within each group:
shop_id,item_id,time,value
150,1,2015-07-10,3
150,1,2015-07-11,5
150,1,2015-07-12,0 # <~~ added
150,1,2015-07-13,2
150,2,2015-07-10,15
150,2,2015-07-11,0 # <~~ added
150,2,2015-07-12,12
For the dates which are added, the corresponding values should by zero. I've read very similar questions though (either using R or SQL coalescing), but most of the solutions I've seen doesn't involve GROUP BYs.
Basically I have access to the SQL database/I can export as CSV for manipulation preferably in C#. Was hoping to find C# libraries that can do such data manipulation but couldn't find any.
Any advice or help is appreciated!
You can use data.table
from R
. Assuming that 'time' column is of 'Date' class,
library(data.table)#v1.9.5+
DT1 <- setDT(df1)[, list(time=seq(min(time), max(time), by ='day')),
by =.(shop_id, item_id)]
setkeyv(df1, names(df1)[1:3])[DT1][is.na(value), value:=0]
# shop_id item_id time value
#1: 150 1 2015-07-10 3
#2: 150 1 2015-07-11 5
#3: 150 1 2015-07-12 0
#4: 150 1 2015-07-13 2
#5: 150 2 2015-07-10 15
#6: 150 2 2015-07-11 0
#7: 150 2 2015-07-12 12
In the devel version, you can also do this without setting the 'key'. Instructions to install the devel version are here
df1[DT1, on =c('shop_id', 'item_id', 'time')][is.na(value), value:=0]
# shop_id item_id time value
#1: 150 1 2015-07-10 3
#2: 150 1 2015-07-11 5
#3: 150 1 2015-07-12 0
#4: 150 1 2015-07-13 2
#5: 150 2 2015-07-10 15
#6: 150 2 2015-07-11 0
#7: 150 2 2015-07-12 12
Or as @Arun suggested, a more efficient option would be
DT1[, value := 0L][df1, value := i.value, on = c('shop_id', 'item_id', 'time')]
DT1
This is a Sql based solution
First you need a dates
table
Date table query. Note this will create a physical table in your database.
;with cte as
(
select cast('2000-01-01' as datetime) as Dates -- Start date
union all
select dateadd(MM,1,Dates)
from cte
where Dates < '2099-12-01' -- End date
)
select *
INTO Date_table
from CTE
Then you need to left outer join
your table with Date_table
to get the missing dates.
SELECT A.shop_id,
A.item_id,
DT.dates,
Isnull(Y.value, 0)
FROM date_table DT
CROSS JOIN(SELECT DISTINCT shop_id,
item_id
FROM yourtable) A
LEFT OUTER JOIN yourtable Y
ON t.[time] = DT.dates
AND A.shop_id = Y.shop_id
AND A.item_id = Y.item_id
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With