Unable to slice pandas dataframe (with date as key) using date as string

Tags:

I'm generating an empty dataframe with a series of dates as the index. Data will be added to the dataframe at a later point.

cbd=pd.date_range(start=pd.datetime(2017,01,02),end=pd.datetime(2017,01,30),period=1)

df = pd.DataFrame(data=None,columns=['Test1','Test2'],index=cbd)

df.head()
           Test1 Test2
2017-01-02   NaN   NaN
2017-01-03   NaN   NaN
2017-01-04   NaN   NaN
2017-01-05   NaN   NaN
2017-01-06   NaN   NaN

A few slicing methods don't seem to work. The following returns a KeyError:

df['2017-01-02']

However any of the following work:

df['2017-01-02':'2017-01-02']
df.loc['2017-01-02']

What am I missing here? Why doesn't the first slice return a result?

943

asked Jan 29 '17 15:01

3 Answers

Dual behavior of `[]` in `df[]`

When you don't use : inside [], then the value(s) inside it will be considered as column(s).
And when you use : inside [], then the value(s) inside it will be considered as row(s).

Why the dual nature?

Because most of the time people want to slice the rows instead of slicing the columns.

So they decided that x and y in df[x:y] should correspond to rows,

and x in d[x] or x, y in df[[x,y]] should correspond to column(s).

Example:

df = pd.DataFrame(data = [[1,2,3], [1,2,3], [1,2,3]],
                                 index = ['A','B','C'], columns = ['A','B','C'])
print df

Output:

Now when you do df['B'], it can mean 2 things:

Take the 2nd index B and give you the 2nd row 1 2 3
```
                 OR
```
Take the 2nd column B and give you the 2nd column 2 2 2.

So in order to resolve this conflict and keep it unambiguous df['B'] will always mean that you want the column 'B', if there is no such column then it will throw an Error.

Why does `df['2017-01-02']` fails?

It will search for a column '2017-01-02', Because there is no such column, it throws an error.

Why does `df.loc['2017-01-02']` works then?

Because .loc[] has syntax of df.loc[row,column] and you can leave out the column if you will, as in your case, it simply means df.loc[row]

answered Oct 20 '22 13:10

Mohammad Yusuf

There is difference, because use different approaches:

For select one row is necessary loc:

df['2017-01-02']

Docs - partial string indexing:

Warning

The following selection will raise a KeyError; otherwise this selection methodology would be inconsistent with other selection methods in pandas (as this is not a slice, nor does it resolve to one):

dft['2013-1-15 12:30:00']

To select a single row, use .loc

In [74]: dft.loc['2013-1-15 12:30:00']
Out[74]: 
A    0.193284
Name: 2013-01-15 12:30:00, dtype: float64

df['2017-01-02':'2017-01-02']

This is pure partial string indexing:

This type of slicing will work on a DataFrame with a DateTimeIndex as well. Since the partial string selection is a form of label slicing, the endpoints will be included. This would include matching times on an included date.

answered Oct 20 '22 15:10

jezrael

First I have updated your test data (just for info) as it returns an 'invalid token' error. Please see changes here:

cbd=pd.date_range(start='2017-01-02',end='2017-01-30',period=1)
df = pd.DataFrame(data=None,columns=['Test1','Test2'],index=cbd)

Now looking at the first row:

In[1]:

df.head(1)

Out[1]:
          Test1 Test2
2017-01-02  NaN NaN

And then trying the initial slicing approach yields this error:

In[2]:    

df['2017-01-02']

Out[2]:

KeyError: '2017-01-02'

Now try this using the column name:

In[3]:    

df.columns

Out[3]:

Index(['Test1', 'Test2'], dtype='object')

In[4]:

We try 'Test1':

df['Test1']

And get the NaN output from this column.

Out[4]:

2017-01-02    NaN
2017-01-03    NaN
2017-01-04    NaN
2017-01-05    NaN

So the format you are using is designed to be used on the column name unless you use this format df['2017-01-02':'2017-01-02'].

The Pandas docs state "The following selection will raise a KeyError; otherwise this selection methodology would be inconsistent with other selection methods in pandas (as this is not a slice, nor does it resolve to one)".

So as you correctly identified, DataFrame.loc is a label based indexer which yields the output you are looking for:

 In[5]:
df.loc['2017-01-02']

 Out[5]:

Test1    NaN
Test2    NaN
Name: 2017-01-02 00:00:00, dtype: object

answered Oct 20 '22 15:10

seemo

Related questions
                            
                                How to draw ellipsoid with plotly
                            
                                How to round Avg() to nearest integer? django
                            
                                Password Protecting Excel file using Python
                            
                                How to import own module in PyCharm console
                            
                                How to convert .dat to .csv using python?
                            
                                find the index of a boolean array whose values are true
                            
                                How to convert a list of tuples to a dictionary of dictionaries in Python?
                            
                                Pandas concatenate alternating columns
                            
                                Python scipy module import error due to missing ._ufuncs dll
                            
                                How can I use urllib.request.urlretrieve with python 2.7
                            
                                django.db.utils.OperationalError: no such table: auth_user
                            
                                Time complexity calculation for my algorithm
                            
                                @method_decorator(csrf_exempt) NameError: name 'method_decorator' is not defined
                            
                                How to find the maximum product of two elements in a list?
                            
                                Problems with Gst in python program
                            
                                Using request args in Flask for a variable URL
                            
                                503 Reponse when trying to use python request on local website
                            
                                Skipping Blank lines in read file python
                            
                                Django: Redirect already logged user by class based view
                            
                                Django Allowed_Hosts Error.

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Unable to slice pandas dataframe (with date as key) using date as string

Tags:

python

pandas

dataframe

datetimeindex

mpny1

People also ask

3 Answers

Dual behavior of `[]` in `df[]`

Why the dual nature?

Example:

Why does `df['2017-01-02']` fails?

Why does `df.loc['2017-01-02']` works then?

Mohammad Yusuf

jezrael

seemo

Recent Activity

Donate For Us

Unable to slice pandas dataframe (with date as key) using date as string

Tags:

python

pandas

dataframe

datetimeindex

mpny1

People also ask

3 Answers

Dual behavior of [] in df[]

Why the dual nature?

Example:

Why does df['2017-01-02'] fails?

Why does df.loc['2017-01-02'] works then?

Mohammad Yusuf

jezrael

seemo

Related questions

Recent Activity

Donate For Us

Dual behavior of `[]` in `df[]`

Why does `df['2017-01-02']` fails?

Why does `df.loc['2017-01-02']` works then?