I took an excel sheet which has dates and some values and want to convert them to pandas dataframe and select only rows which are between certain dates.
For some reason I cannot select a row by date index
Raw Data in Excel file
MCU
Timestamp 50D 10P1 10P2 10P3 10P6 10P9 10P12
12-Feb-15 25.17 5.88 5.92 5.98 6.18 6.23 6.33
11-Feb-15 25.9 6.05 6.09 6.15 6.28 6.31 6.39
10-Feb-15 26.38 5.94 6.05 6.15 6.33 6.39 6.46
Code
xls = pd.ExcelFile('e:/Data.xlsx')
vols = xls.parse(asset.upper()+'VOL',header=1)
vols.set_index('Timestamp',inplace=True)
Data before set_index
Timestamp 50D 10P1 10P2 10P3 10P6 10P9 10P12 25P1 25P2 \
0 2015-02-12 25.17 5.88 5.92 5.98 6.18 6.23 6.33 2.98 3.08
1 2015-02-11 25.90 6.05 6.09 6.15 6.28 6.31 6.39 3.12 3.17
2 2015-02-10 26.38 5.94 6.05 6.15 6.33 6.39 6.46 3.01 3.16
Data after set_index
50D 10P1 10P2 10P3 10P6 10P9 10P12 25P1 25P2 25P3 \
Timestamp
2015-02-12 25.17 5.88 5.92 5.98 6.18 6.23 6.33 2.98 3.08 3.21
2015-02-11 25.90 6.05 6.09 6.15 6.28 6.31 6.39 3.12 3.17 3.32
2015-02-10 26.38 5.94 6.05 6.15 6.33 6.39 6.46 3.01 3.16 3.31
Output
>>> vols.index
<class 'pandas.tseries.index.DatetimeIndex'>
[2015-02-12, ..., NaT]
Length: 1478, Freq: None, Timezone: None
>>> vols[date(2015,2,12)]
*** KeyError: datetime.date(2015, 2, 12)
I would expect this not to fail, and also I should be able to select a range of dates. Tried so many combinations but not getting it.
Pandas replacement for python datetime.datetime object. Timestamp is the pandas equivalent of python’s Datetime and is interchangeable with it in most cases. It’s the type used for the entries that make up a DatetimeIndex, and other timeseries oriented data structures in pandas.
Indexing and selecting data¶. The axis labeling information in pandas objects serves many purposes: Identifies data (i.e. provides metadata) using known indicators, important for analysis, visualization, and interactive console display. Enables automatic and explicit data alignment.
An easy way to do this is via Pandas’ multi-indexing functionality. If you run the following line of code on our data above (stored in the dataframe called data ), it creates a multi-index for data. We’ve chosen to index by both stock ticker and date, hence multi-indexing because we are indexing by more than one column.
Comparison between pandas timestamp objects is carried out using simple comparison operators: >, <,==,< = , >=. The difference can be calculated using a simple ‘–’ operator. Given time can be converted to pandas timestamp using pandas.Timestamp() method.
Using a datetime.date
instance to try to retrieve the index won't work, you just need a string representation of the date, e.g. '2015-02-12'
or '2015/02/14'
.
Secondly, vols[date(2015,2,12)]
is actually looking in your DataFrame's column headings, not the index. You can use loc
to fetch row index labels instead. For example you could write vols.loc['2015-02-12']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With