Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to properly extend a pandas dataframe?

given a df:

Date and Time,Open,High,Low,Close                 
07/13/2017 15:55,1.1436,1.1436,1.1436,1.1436
07/13/2017 15:56,1.1435,1.1435,1.1435,1.1435
07/13/2017 15:57,1.1434,1.1434,1.1434,1.1434
07/13/2017 15:58,1.1436,1.1436,1.1436,1.1436
07/13/2017 15:59,1.1437,1.1437,1.1437,1.1437

How can I extend the index and fill the dataframe with the values from last row, the result output would be something like this:

Extend df 24h, at 1min frequency:

Date and Time,Open,High,Low,Close
07/13/2017 15:55,1.1436,1.1436,1.1436,1.1436   |
07/13/2017 15:56,1.1435,1.1435,1.1435,1.1435   |
07/13/2017 15:57,1.1434,1.1434,1.1434,1.1434   | --> Existing Data
07/13/2017 15:58,1.1436,1.1436,1.1436,1.1436   |
07/13/2017 15:59,1.1437,1.1437,1.1437,1.1437   |
07/13/2017 16:00,1.1437,1.1437,1.1437,1.1437   
07/13/2017 16:02,1.1437,1.1437,1.1437,1.1437
07/13/2017 16:03,1.1437,1.1437,1.1437,1.1437
...
...
...
07/14/2017 15:57,1.1437,1.1437,1.1437,1.1437
07/14/2017 15:58,1.1437,1.1437,1.1437,1.1437
07/14/2017 15:59,1.1437,1.1437,1.1437,1.1437
like image 649
hernanavella Avatar asked Oct 30 '22 04:10

hernanavella


1 Answers

One way to do this (hopefully not the only way) is to get the date_range between the first value in the Date and Time column and 24 hours plus the length of the dataframe after it (in increments of one minute: 1440 + len(df)) and make a dataframe with it; then merge that dataframe to your original dataframe. The following is an illustration:

import pandas as pd


d = {'Date and Time': [pd.Timestamp('2017-07-13 15:55:00'),
                       pd.Timestamp('2017-07-13 15:56:00'),
                       pd.Timestamp('2017-07-13 15:57:00'),
                       pd.Timestamp('2017-07-13 15:58:00'),
                       pd.Timestamp('2017-07-13 15:59:00')],
     'Open': [1.1436, 1.1435, 1.1434, 1.1436, 1.1437],
     'High': [1.1436, 1.1435, 1.1434, 1.1436, 1.1437],
     'Low': [1.1436, 1.1435, 1.1434, 1.1436, 1.1437],
     'Close': [1.1436, 1.1435, 1.1434, 1.1436, 1.1437]}


df = pd.DataFrame(d)

df1 = pd.date_range(start=df['Date and Time'][0], periods=1440+len(df), freq='1min').to_series().to_frame('Date and Time')

print(df1.merge(df, how='left').ffill())

Should yield something like the following:

           Date and Time    Open    High     Low   Close
0    2017-07-13 15:55:00  1.1436  1.1436  1.1436  1.1436
1    2017-07-13 15:56:00  1.1435  1.1435  1.1435  1.1435
2    2017-07-13 15:57:00  1.1434  1.1434  1.1434  1.1434
3    2017-07-13 15:58:00  1.1436  1.1436  1.1436  1.1436
4    2017-07-13 15:59:00  1.1437  1.1437  1.1437  1.1437
5    2017-07-13 16:00:00  1.1437  1.1437  1.1437  1.1437
6    2017-07-13 16:01:00  1.1437  1.1437  1.1437  1.1437
7    2017-07-13 16:02:00  1.1437  1.1437  1.1437  1.1437
8    2017-07-13 16:03:00  1.1437  1.1437  1.1437  1.1437
9    2017-07-13 16:04:00  1.1437  1.1437  1.1437  1.1437
10   2017-07-13 16:05:00  1.1437  1.1437  1.1437  1.1437
11   2017-07-13 16:06:00  1.1437  1.1437  1.1437  1.1437
12   2017-07-13 16:07:00  1.1437  1.1437  1.1437  1.1437
13   2017-07-13 16:08:00  1.1437  1.1437  1.1437  1.1437
14   2017-07-13 16:09:00  1.1437  1.1437  1.1437  1.1437
15   2017-07-13 16:10:00  1.1437  1.1437  1.1437  1.1437
16   2017-07-13 16:11:00  1.1437  1.1437  1.1437  1.1437
17   2017-07-13 16:12:00  1.1437  1.1437  1.1437  1.1437
18   2017-07-13 16:13:00  1.1437  1.1437  1.1437  1.1437
19   2017-07-13 16:14:00  1.1437  1.1437  1.1437  1.1437
20   2017-07-13 16:15:00  1.1437  1.1437  1.1437  1.1437
21   2017-07-13 16:16:00  1.1437  1.1437  1.1437  1.1437
22   2017-07-13 16:17:00  1.1437  1.1437  1.1437  1.1437
23   2017-07-13 16:18:00  1.1437  1.1437  1.1437  1.1437
24   2017-07-13 16:19:00  1.1437  1.1437  1.1437  1.1437
25   2017-07-13 16:20:00  1.1437  1.1437  1.1437  1.1437
26   2017-07-13 16:21:00  1.1437  1.1437  1.1437  1.1437
27   2017-07-13 16:22:00  1.1437  1.1437  1.1437  1.1437
28   2017-07-13 16:23:00  1.1437  1.1437  1.1437  1.1437
29   2017-07-13 16:24:00  1.1437  1.1437  1.1437  1.1437
...                  ...     ...     ...     ...     ...
1415 2017-07-14 15:30:00  1.1437  1.1437  1.1437  1.1437
1416 2017-07-14 15:31:00  1.1437  1.1437  1.1437  1.1437
1417 2017-07-14 15:32:00  1.1437  1.1437  1.1437  1.1437
1418 2017-07-14 15:33:00  1.1437  1.1437  1.1437  1.1437
1419 2017-07-14 15:34:00  1.1437  1.1437  1.1437  1.1437
1420 2017-07-14 15:35:00  1.1437  1.1437  1.1437  1.1437
1421 2017-07-14 15:36:00  1.1437  1.1437  1.1437  1.1437
1422 2017-07-14 15:37:00  1.1437  1.1437  1.1437  1.1437
1423 2017-07-14 15:38:00  1.1437  1.1437  1.1437  1.1437
1424 2017-07-14 15:39:00  1.1437  1.1437  1.1437  1.1437
1425 2017-07-14 15:40:00  1.1437  1.1437  1.1437  1.1437
1426 2017-07-14 15:41:00  1.1437  1.1437  1.1437  1.1437
1427 2017-07-14 15:42:00  1.1437  1.1437  1.1437  1.1437
1428 2017-07-14 15:43:00  1.1437  1.1437  1.1437  1.1437
1429 2017-07-14 15:44:00  1.1437  1.1437  1.1437  1.1437
1430 2017-07-14 15:45:00  1.1437  1.1437  1.1437  1.1437
1431 2017-07-14 15:46:00  1.1437  1.1437  1.1437  1.1437
1432 2017-07-14 15:47:00  1.1437  1.1437  1.1437  1.1437
1433 2017-07-14 15:48:00  1.1437  1.1437  1.1437  1.1437
1434 2017-07-14 15:49:00  1.1437  1.1437  1.1437  1.1437
1435 2017-07-14 15:50:00  1.1437  1.1437  1.1437  1.1437
1436 2017-07-14 15:51:00  1.1437  1.1437  1.1437  1.1437
1437 2017-07-14 15:52:00  1.1437  1.1437  1.1437  1.1437
1438 2017-07-14 15:53:00  1.1437  1.1437  1.1437  1.1437
1439 2017-07-14 15:54:00  1.1437  1.1437  1.1437  1.1437
1440 2017-07-14 15:55:00  1.1437  1.1437  1.1437  1.1437
1441 2017-07-14 15:56:00  1.1437  1.1437  1.1437  1.1437
1442 2017-07-14 15:57:00  1.1437  1.1437  1.1437  1.1437
1443 2017-07-14 15:58:00  1.1437  1.1437  1.1437  1.1437
1444 2017-07-14 15:59:00  1.1437  1.1437  1.1437  1.1437

I hope this serves a purpose.

like image 86
Abdou Avatar answered Nov 09 '22 11:11

Abdou