I have some OHLCV data stored in TimescaleDB which contains missing data during certain time ranges. This data needs to be resampled to a different time period (i.e. 1 day) and contain contiguous, ordered time buckets.
TimescaleDB provides the function time_bucket_gapfill
to do this. My current query is:
SELECT
time_bucket_gapfill(
'1 day',
"timestamp",
'2017-07-25 00:00',
'2018-01-01 00:00'
) as date,
FIRST(open, "timestamp") as open,
MAX(high) as high,
MIN(low) as low,
LAST(close, "timestamp") as close,
SUM(volume) as volume
FROM ohlcv
WHERE "timestamp" > '2017-07-25'
GROUP BY date ORDER BY date ASC LIMIT 10
which gives as the results
date open high low close volume
2017-07-25 00:00:00+00
2017-07-26 00:00:00+00
2017-07-27 00:00:00+00 0.00992 0.010184 0.009679 0.010039 65553.5299999999
2017-07-28 00:00:00+00 0.00999 0.010059 0.009225 0.009248 43049.93
2017-07-29 00:00:00+00
2017-07-30 00:00:00+00 0.009518 0.0098 0.009286 0.009457 40510.0599999999
...
Question: It looks like only the date
column has been gapfilled. By modifying the SQL statement, is it possible to also gap-fill the columns open
, high
, low
, close
and volume
such that we obtain the results:
date open high low close volume
2017-07-25 00:00:00+00 0 0 0 0 0
2017-07-26 00:00:00+00 0 0 0 0 0
2017-07-27 00:00:00+00 0.00992 0.010184 0.009679 0.010039 65553.5299999999
2017-07-28 00:00:00+00 0.00999 0.010059 0.009225 0.009248 43049.93
2017-07-29 00:00:00+00 0.009248 0.009248 0.009248 0.009248 0
2017-07-30 00:00:00+00 0.009518 0.0098 0.009286 0.009457 40510.0599999999
...
Or is it recommended to perform this data inputation after receiving the query result, such as in Python/Nodejs?
Prefer to perform this gapfilling/inputation using TimescaleDB instead of using my Nodejs app because... using Nodejs to do this will be much slower and I do not want to introduce Python into the app just to do this processing
import pandas as pd
# Building the test dataset simulating missing values after time_bucket
data = [
(pd.Timestamp('2020-01-01'), None, None, None, None, None),
(pd.Timestamp('2020-01-02'), 100, 110, 90, 95, 3),
(pd.Timestamp('2020-01-03'), None, None, None, None, None),
(pd.Timestamp('2020-01-04'), 98, 150, 100, 100, 4),
]
df = pd.DataFrame(data, columns=['date', 'open' , 'high', 'low', 'close', 'volume']).set_index('date')
# open high low close volume
# date
# 2020-01-01 NaN NaN NaN NaN NaN
# 2020-01-02 100.0 110.0 90.0 95.0 3.0
# 2020-01-03 NaN NaN NaN NaN NaN
# 2020-01-04 98.0 150.0 100.0 100.0 4.0
# Perform gap filling
df.close = df.close.fillna(method='ffill')
df.volume = df.volume.fillna(0) # fill missing volume with 0
df['open'] = df['open'].fillna(df['close']) # fill missing open by forward-filling close
df['high'] = df['high'].fillna(df['close']) # fill missing high by forward-filling close
df['low'] = df['low'].fillna(df['close']) # fill missing low by forward-filling close
df = df.fillna(0) # fill missing OHLC with 0 if no previous values available
# open high low close volume
# date
# 2020-01-01 0.0 0.0 0.0 0.0 0.0
# 2020-01-02 100.0 110.0 90.0 95.0 3.0
# 2020-01-03 95.0 95.0 95.0 95.0 0.0
# 2020-01-04 98.0 150.0 100.0 100.0 4.0
SELECT "tickerId",
"ts",
coalesce("open", "close") "open",
coalesce("high", "close") "high",
coalesce("low", "close") "low",
coalesce("close", "close") "close",
coalesce("volume", 0) "volume",
coalesce("count", 0) "count"
FROM (
SELECT "tickerId",
time_bucket_gapfill('1 hour', at) "ts",
first(price, "eId") "open",
MAX(price) "high",
MIN(price) "low",
locf(last(price, "eId")) "close",
SUM(volume) "volume",
COUNT(1) "count"
FROM "PublicTrades"
WHERE at >= date_trunc('day', now() - INTERVAL '1 year')
AND at < NOW()
GROUP BY "tickerId", "ts"
ORDER BY "tickerId", "ts" DESC
LIMIT 100
) AS P
Notice: eId
is Exchange Public Trade ID
You need to specify in each column how to perform the gapfilling. My guess is that you probably want to use locf
. See:
https://docs.timescale.com/latest/api#time_bucket_gapfill https://docs.timescale.com/latest/api#locf
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With