Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to access pandas DataFrame datetime index using strings

Tags:

python

pandas

This is a very simple and practical question. I have the feeling that it must be a silly detail and that there should be similar questions. I wasn't able to find them tho. If someone does I'll happily delete this one.

The closest I found were these: pandas: iterating over DataFrame index with loc

How to select rows within a pandas dataframe based on time only when index is date and time

anyway, the thing is, I have a datetime indexed panda dataframe as follows:

In[81]: y
Out[81]: 
            PETR4  CSNA3  VALE5
2008-01-01    0.0    0.0    0.0
2008-01-02    1.0    1.0    1.0
2008-01-03    7.0    7.0    7.0

In[82]: y.index
Out[82]: DatetimeIndex(['2008-01-01', '2008-01-02', '2008-01-03'], dtype='datetime64[ns]', freq=None)

Oddly enough, I can't access its values using none of the following methods:

In[83]: y[datetime.datetime(2008,1,1)]
In[84]: y['2008-1-1']
In[85]: y['1/1/2008']

I get the KeyError error.

Even more weird is that the following methods DO work:

In[86]: y['2008']
Out[86]: 
            PETR4  CSNA3  VALE5
2008-01-01    0.0    0.0    0.0
2008-01-02    1.0    1.0    1.0
2008-01-03    7.0    7.0    7.0
In[87]: y['2008-1']
Out[87]: 
            PETR4  CSNA3  VALE5
2008-01-01    0.0    0.0    0.0
2008-01-02    1.0    1.0    1.0
2008-01-03    7.0    7.0    7.0

I'm fairly new to pandas so maybe I'm missing something here?

like image 626
Pedro Braz Avatar asked Apr 26 '16 16:04

Pedro Braz


People also ask

How do I access the pandas Series index?

Accessing Element from Series with Position In order to access the series element refers to the index number. Use the index operator [ ] to access an element in a series. The index must be an integer. In order to access multiple elements from a series, we use Slice operation.

How do I retrieve a DataFrame index?

To get the index of a Pandas DataFrame, call DataFrame. index property. The DataFrame. index property returns an Index object representing the index of this DataFrame.

What is pandas datetime index?

DatetimeIndex. class pandas. DatetimeIndex [source] Immutable ndarray of datetime64 data, represented internally as int64, and which can be boxed to Timestamp objects that are subclasses of datetime and carry metadata such as frequency information.


Video Answer


2 Answers

pandas is taking what's inside the [] and deciding what it should do. If it's a subset of column names, it'll return a DataFrame with those columns. If it's a range of index values, it'll return a subset of those rows. What is does not handle is taking a single index value.

Solution

Two work around's

1.Turn the argument into something pandas interprets as a range.

df['2008-01-01':'2008-01-01']

2.Use the method designed to give you this result. loc[]

df.loc['2008-01-01']

Link to the documentation

like image 52
piRSquared Avatar answered Oct 06 '22 13:10

piRSquared


Reversing your dataframe allows the indexing to work:

Here is your .csv datafile:

Date,PETR4,CSNA3,VALE5
2008-01-01,0.0,0.0,0.0
2008-01-02,1.0,1.0,1.0
2008-01-03,7.0,7.0,7.0

Use the following incantation to read it into a DataFrame:

>>> a = pd.read_csv('your.csv', index_col=0, parse_dates=True, infer_datetime_format=True)

Then, try to index a row:

>>> a['2008-01-01']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1969, in __getitem__
    return self._getitem_column(key)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1976, in _getitem_column
    return self._get_item_cache(key)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1091, in _get_item_cache
    values = self._data.get(item)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 3211, in get
    loc = self.items.get_loc(item)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/index.py", line 1759, in get_loc
    return self._engine.get_loc(key)
  File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:3979)
  File "pandas/index.pyx", line 157, in pandas.index.IndexEngine.get_loc (pandas/index.c:3843)
  File "pandas/hashtable.pyx", line 668, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12265)
  File "pandas/hashtable.pyx", line 676, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12216)
KeyError: '2008-01-01'

You end up with a traceback with KeyError

However, if you reverse it, like this:

>>> b = a[::-1]

Then try the same index, you get the proper result:

>>> b['2008-01-01']
            PETR4  CSNA3  VALE5
Date                           
2008-01-01      0      0      0

I do NOT know why this is the case. Chances are, it has something to do with being a time series one way, but not the other? Someone more knowledgeable should answer that.

Update: By RTFM, I discovered this page:

https://pandas.pydata.org/pandas-docs/stable/timeseries.html

If you find the section titled "Slice vs. Exact Match", there is a warning that explains this behavior. The problem seems to be that for a TimeSeries, an exact match is interpreted as a column name. For unsorted dataframes, this doesn't happen. See the warning box in the section referenced above. I still find this terribly confusing, but there you go...

Edit: Changed the printout of b, which was wrong in the original.

Edit1: Update with explanation in python documentation.

like image 6
bob_monsen Avatar answered Oct 06 '22 12:10

bob_monsen