How to access pandas DataFrame datetime index using strings

Tags:

This is a very simple and practical question. I have the feeling that it must be a silly detail and that there should be similar questions. I wasn't able to find them tho. If someone does I'll happily delete this one.

The closest I found were these: pandas: iterating over DataFrame index with loc

How to select rows within a pandas dataframe based on time only when index is date and time

anyway, the thing is, I have a datetime indexed panda dataframe as follows:

In[81]: y
Out[81]: 
            PETR4  CSNA3  VALE5
2008-01-01    0.0    0.0    0.0
2008-01-02    1.0    1.0    1.0
2008-01-03    7.0    7.0    7.0

In[82]: y.index
Out[82]: DatetimeIndex(['2008-01-01', '2008-01-02', '2008-01-03'], dtype='datetime64[ns]', freq=None)

Oddly enough, I can't access its values using none of the following methods:

In[83]: y[datetime.datetime(2008,1,1)]
In[84]: y['2008-1-1']
In[85]: y['1/1/2008']

I get the KeyError error.

Even more weird is that the following methods DO work:

In[86]: y['2008']
Out[86]: 
            PETR4  CSNA3  VALE5
2008-01-01    0.0    0.0    0.0
2008-01-02    1.0    1.0    1.0
2008-01-03    7.0    7.0    7.0
In[87]: y['2008-1']
Out[87]: 
            PETR4  CSNA3  VALE5
2008-01-01    0.0    0.0    0.0
2008-01-02    1.0    1.0    1.0
2008-01-03    7.0    7.0    7.0

I'm fairly new to pandas so maybe I'm missing something here?

626

asked Apr 26 '16 16:04

Pedro Braz

Video Answer

2 Answers

pandas is taking what's inside the [] and deciding what it should do. If it's a subset of column names, it'll return a DataFrame with those columns. If it's a range of index values, it'll return a subset of those rows. What is does not handle is taking a single index value.

Solution

Two work around's

1.Turn the argument into something pandas interprets as a range.

df['2008-01-01':'2008-01-01']

2.Use the method designed to give you this result. loc[]

df.loc['2008-01-01']

Link to the documentation

answered Oct 06 '22 13:10

piRSquared

Reversing your dataframe allows the indexing to work:

Here is your .csv datafile:

Date,PETR4,CSNA3,VALE5
2008-01-01,0.0,0.0,0.0
2008-01-02,1.0,1.0,1.0
2008-01-03,7.0,7.0,7.0

Use the following incantation to read it into a DataFrame:

>>> a = pd.read_csv('your.csv', index_col=0, parse_dates=True, infer_datetime_format=True)

Then, try to index a row:

>>> a['2008-01-01']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1969, in __getitem__
    return self._getitem_column(key)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1976, in _getitem_column
    return self._get_item_cache(key)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1091, in _get_item_cache
    values = self._data.get(item)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 3211, in get
    loc = self.items.get_loc(item)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/index.py", line 1759, in get_loc
    return self._engine.get_loc(key)
  File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:3979)
  File "pandas/index.pyx", line 157, in pandas.index.IndexEngine.get_loc (pandas/index.c:3843)
  File "pandas/hashtable.pyx", line 668, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12265)
  File "pandas/hashtable.pyx", line 676, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12216)
KeyError: '2008-01-01'

You end up with a traceback with KeyError

However, if you reverse it, like this:

>>> b = a[::-1]

Then try the same index, you get the proper result:

>>> b['2008-01-01']
            PETR4  CSNA3  VALE5
Date                           
2008-01-01      0      0      0

I do NOT know why this is the case. Chances are, it has something to do with being a time series one way, but not the other? Someone more knowledgeable should answer that.

Update: By RTFM, I discovered this page:

https://pandas.pydata.org/pandas-docs/stable/timeseries.html

If you find the section titled "Slice vs. Exact Match", there is a warning that explains this behavior. The problem seems to be that for a TimeSeries, an exact match is interpreted as a column name. For unsorted dataframes, this doesn't happen. See the warning box in the section referenced above. I still find this terribly confusing, but there you go...

Edit: Changed the printout of b, which was wrong in the original.

Edit1: Update with explanation in python documentation.

answered Oct 06 '22 12:10

bob_monsen

Related questions
                            
                                Python: Elegantly merge dictionaries with sum() of values [duplicate]
                            
                                How to construct a TarFile object in memory from byte buffer in Python 3?
                            
                                How to send an email through gmail without enabling 'insecure access'?
                            
                                Numpy - the best way to remove the last element from 1 dimensional array?
                            
                                python struct.error: 'i' format requires -2147483648 <= number <= 2147483647
                            
                                How come I can add the boolean value False but not True in a set in Python? [duplicate]
                            
                                how to use 'extent' in matplotlib.pyplot.imshow
                            
                                Give the Python Terminal a Persistent History
                            
                                python matplotlib dash-dot-dot - how to?
                            
                                Python joining current directory and parent directory with os.path.join
                            
                                Can't modify list elements in a loop [duplicate]
                            
                                Flask request.args vs request.form
                            
                                looking for source code of from gen_nn_ops in tensorflow
                            
                                Recommendation for python form validation library [closed]
                            
                                How to nest records in an Avro schema?
                            
                                Converting byte string in unicode string
                            
                                Wrapping long y labels in matplotlib tight layout using setp
                            
                                django template if or statement
                            
                                Python Multiprocessing Lib Error (AttributeError: __exit__)
                            
                                Converting string 'yyyy-mm-dd' into datetime [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to access pandas DataFrame datetime index using strings

Tags:

python

pandas