This is a very simple and practical question. I have the feeling that it must be a silly detail and that there should be similar questions. I wasn't able to find them tho. If someone does I'll happily delete this one.
The closest I found were these: pandas: iterating over DataFrame index with loc
How to select rows within a pandas dataframe based on time only when index is date and time
anyway, the thing is, I have a datetime indexed panda dataframe as follows:
In[81]: y
Out[81]:
PETR4 CSNA3 VALE5
2008-01-01 0.0 0.0 0.0
2008-01-02 1.0 1.0 1.0
2008-01-03 7.0 7.0 7.0
In[82]: y.index
Out[82]: DatetimeIndex(['2008-01-01', '2008-01-02', '2008-01-03'], dtype='datetime64[ns]', freq=None)
Oddly enough, I can't access its values using none of the following methods:
In[83]: y[datetime.datetime(2008,1,1)]
In[84]: y['2008-1-1']
In[85]: y['1/1/2008']
I get the KeyError
error.
Even more weird is that the following methods DO work:
In[86]: y['2008']
Out[86]:
PETR4 CSNA3 VALE5
2008-01-01 0.0 0.0 0.0
2008-01-02 1.0 1.0 1.0
2008-01-03 7.0 7.0 7.0
In[87]: y['2008-1']
Out[87]:
PETR4 CSNA3 VALE5
2008-01-01 0.0 0.0 0.0
2008-01-02 1.0 1.0 1.0
2008-01-03 7.0 7.0 7.0
I'm fairly new to pandas so maybe I'm missing something here?
Accessing Element from Series with Position In order to access the series element refers to the index number. Use the index operator [ ] to access an element in a series. The index must be an integer. In order to access multiple elements from a series, we use Slice operation.
To get the index of a Pandas DataFrame, call DataFrame. index property. The DataFrame. index property returns an Index object representing the index of this DataFrame.
DatetimeIndex. class pandas. DatetimeIndex [source] Immutable ndarray of datetime64 data, represented internally as int64, and which can be boxed to Timestamp objects that are subclasses of datetime and carry metadata such as frequency information.
pandas is taking what's inside the []
and deciding what it should do. If it's a subset of column names, it'll return a DataFrame with those columns. If it's a range of index values, it'll return a subset of those rows. What is does not handle is taking a single index value.
Two work around's
1.Turn the argument into something pandas interprets as a range.
df['2008-01-01':'2008-01-01']
2.Use the method designed to give you this result. loc[]
df.loc['2008-01-01']
Link to the documentation
Reversing your dataframe allows the indexing to work:
Here is your .csv datafile:
Date,PETR4,CSNA3,VALE5
2008-01-01,0.0,0.0,0.0
2008-01-02,1.0,1.0,1.0
2008-01-03,7.0,7.0,7.0
Use the following incantation to read it into a DataFrame:
>>> a = pd.read_csv('your.csv', index_col=0, parse_dates=True, infer_datetime_format=True)
Then, try to index a row:
>>> a['2008-01-01']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1969, in __getitem__
return self._getitem_column(key)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1976, in _getitem_column
return self._get_item_cache(key)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1091, in _get_item_cache
values = self._data.get(item)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 3211, in get
loc = self.items.get_loc(item)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/index.py", line 1759, in get_loc
return self._engine.get_loc(key)
File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:3979)
File "pandas/index.pyx", line 157, in pandas.index.IndexEngine.get_loc (pandas/index.c:3843)
File "pandas/hashtable.pyx", line 668, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12265)
File "pandas/hashtable.pyx", line 676, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12216)
KeyError: '2008-01-01'
You end up with a traceback with KeyError
However, if you reverse it, like this:
>>> b = a[::-1]
Then try the same index, you get the proper result:
>>> b['2008-01-01']
PETR4 CSNA3 VALE5
Date
2008-01-01 0 0 0
I do NOT know why this is the case. Chances are, it has something to do with being a time series one way, but not the other? Someone more knowledgeable should answer that.
Update: By RTFM, I discovered this page:
https://pandas.pydata.org/pandas-docs/stable/timeseries.html
If you find the section titled "Slice vs. Exact Match", there is a warning that explains this behavior. The problem seems to be that for a TimeSeries, an exact match is interpreted as a column name. For unsorted dataframes, this doesn't happen. See the warning box in the section referenced above. I still find this terribly confusing, but there you go...
Edit: Changed the printout of b, which was wrong in the original.
Edit1: Update with explanation in python documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With