I have a DataFrame something like this:
data = [['2019-01-01', .1, .2],
['2019-01-02', .5, .3],
['2019-01-03', .2, .4]]
df = pd.DataFrame(data, columns=['date', 'hour01', 'hour02'])
date hour01 hour02
0 2019-01-01 0.1 0.2
1 2019-01-02 0.5 0.3
2 2019-01-03 0.2 0.4
How to melt it so that I get proper 15 minutes intervals? Like this:
timestamp value
0 2019-01-01 00:00:00 0.1
1 2019-01-01 00:15:00 0.1
2 2019-01-01 00:30:00 0.1
3 2019-01-01 00:45:00 0.1
4 2019-01-01 01:00:00 0.2
5 2019-01-01 01:15:00 0.2
6 2019-01-01 01:30:00 0.2
7 2019-01-01 01:45:00 0.2
...
16 2019-01-03 00:00:00 0.2
17 2019-01-03 00:15:00 0.2
18 2019-01-03 00:30:00 0.2
19 2019-01-03 00:45:00 0.2
20 2019-01-03 01:00:00 0.4
21 2019-01-03 01:15:00 0.4
22 2019-01-03 01:30:00 0.4
23 2019-01-03 01:45:00 0.4
df.melt(id_vars=['timestamp'], value_vars=['hour_{}'.format(str(x).zfill(2)) for x in range(1, 24)])
gives me this:
27 2017-01-28 hour_01 34.90
28 2017-01-29 hour_01 36.04
29 2017-01-30 hour_01 36.51
... ... ...
16760 2018-12-02 hour_23 51.50
16761 2018-12-03 hour_23 54.00
16762 2018-12-04 hour_23 53.87
Where to go from here?
Maybe you can do it starting with melt too, but unless using melt is by some reason a requirement, you can obtain it in this way:
'date' a datetime column, if not already. groupby and apply you can generate the timestamps for all the time intervals using pandas date_range and spanning the hourly values using numpy repeat.Translated in a code is:
df['date'] = pd.to_datetime(df['date'])
ddf = df.groupby('date').apply(lambda row : pd.DataFrame(
{'timestamp' : pd.date_range(row['date'].iloc[0], periods=4*len(df.columns[1:]), freq='15T'),
'value' : np.repeat(np.array([row[col].iloc[0] for col in df.columns[1:]]), 4)}))
ddf.reset_index(inplace=True, drop=True)
Using your starting dataframe, ddf is:
timestamp value
0 2019-01-01 00:00:00 0.1
1 2019-01-01 00:15:00 0.1
2 2019-01-01 00:30:00 0.1
3 2019-01-01 00:45:00 0.1
4 2019-01-01 01:00:00 0.2
5 2019-01-01 01:15:00 0.2
6 2019-01-01 01:30:00 0.2
7 2019-01-01 01:45:00 0.2
8 2019-01-02 00:00:00 0.5
9 2019-01-02 00:15:00 0.5
10 2019-01-02 00:30:00 0.5
11 2019-01-02 00:45:00 0.5
12 2019-01-02 01:00:00 0.3
13 2019-01-02 01:15:00 0.3
14 2019-01-02 01:30:00 0.3
15 2019-01-02 01:45:00 0.3
16 2019-01-03 00:00:00 0.2
17 2019-01-03 00:15:00 0.2
18 2019-01-03 00:30:00 0.2
19 2019-01-03 00:45:00 0.2
20 2019-01-03 01:00:00 0.4
21 2019-01-03 01:15:00 0.4
22 2019-01-03 01:30:00 0.4
23 2019-01-03 01:45:00 0.4
This code will automatically pick how many columns you have after 'date', assuming that they are all 'hour' columns. If you have other columns mixed in the dataframe, they should be filtered out from df.columns[1:].
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With