I'm generating an empty dataframe with a series of dates as the index. Data will be added to the dataframe at a later point.
cbd=pd.date_range(start=pd.datetime(2017,01,02),end=pd.datetime(2017,01,30),period=1)
df = pd.DataFrame(data=None,columns=['Test1','Test2'],index=cbd)
df.head()
Test1 Test2
2017-01-02 NaN NaN
2017-01-03 NaN NaN
2017-01-04 NaN NaN
2017-01-05 NaN NaN
2017-01-06 NaN NaN
A few slicing methods don't seem to work. The following returns a KeyError:
df['2017-01-02']
However any of the following work:
df['2017-01-02':'2017-01-02']
df.loc['2017-01-02']
What am I missing here? Why doesn't the first slice return a result?
Pandas has a built-in function called to_datetime()that converts date and time in string format to a DateTime object. As you can see, the 'date' column in the DataFrame is currently of a string-type object. Thus, to_datetime() converts the column to a series of the appropriate datetime64 dtype.
[]
in df[]
:
inside []
, then the value(s) inside it will be considered as column(s).:
inside []
, then the value(s) inside it will be considered as row(s).Because most of the time people want to slice the rows instead of slicing the columns.
So they decided that x
and y
in df[x:y]
should correspond to rows,
and x
in d[x]
or x
, y
in df[[x,y]]
should correspond to column(s).
df = pd.DataFrame(data = [[1,2,3], [1,2,3], [1,2,3]],
index = ['A','B','C'], columns = ['A','B','C'])
print df
Output:
A B C
A 1 2 3
B 1 2 3
C 1 2 3
Now when you do df['B']
, it can mean 2 things:
Take the 2nd index B
and give you the 2nd row 1 2 3
OR
Take the 2nd column B
and give you the 2nd column 2 2 2
.
So in order to resolve this conflict and keep it unambiguous df['B']
will always mean that you want the column 'B'
, if there is no such column then it will throw an Error.
df['2017-01-02']
fails?It will search for a column '2017-01-02'
, Because there is no such column, it throws an error.
df.loc['2017-01-02']
works then?Because .loc[]
has syntax of df.loc[row,column]
and you can leave out the column if you will, as in your case, it simply means df.loc[row]
There is difference, because use different approaches:
For select one row is necessary loc
:
df['2017-01-02']
Docs - partial string indexing:
Warning
The following selection will raise a KeyError; otherwise this selection methodology would be inconsistent with other selection methods in pandas (as this is not a slice, nor does it resolve to one):
dft['2013-1-15 12:30:00']
To select a single row, use .loc
In [74]: dft.loc['2013-1-15 12:30:00']
Out[74]:
A 0.193284
Name: 2013-01-15 12:30:00, dtype: float64
df['2017-01-02':'2017-01-02']
This is pure partial string indexing:
This type of slicing will work on a
DataFrame
with a DateTimeIndex as well. Since the partial string selection is a form of label slicing, the endpoints will be included. This would include matching times on an included date.
First I have updated your test data (just for info) as it returns an 'invalid token' error. Please see changes here:
cbd=pd.date_range(start='2017-01-02',end='2017-01-30',period=1)
df = pd.DataFrame(data=None,columns=['Test1','Test2'],index=cbd)
Now looking at the first row:
In[1]:
df.head(1)
Out[1]:
Test1 Test2
2017-01-02 NaN NaN
And then trying the initial slicing approach yields this error:
In[2]:
df['2017-01-02']
Out[2]:
KeyError: '2017-01-02'
Now try this using the column
name:
In[3]:
df.columns
Out[3]:
Index(['Test1', 'Test2'], dtype='object')
In[4]:
We try 'Test1':
df['Test1']
And get the NaN
output from this column.
Out[4]:
2017-01-02 NaN
2017-01-03 NaN
2017-01-04 NaN
2017-01-05 NaN
So the format you are using is designed to be used on the column
name unless you use this format df['2017-01-02':'2017-01-02']
.
The Pandas docs state "The following selection will raise a KeyError; otherwise this selection methodology would be inconsistent with other selection methods in pandas (as this is not a slice, nor does it resolve to one)".
So as you correctly identified, DataFrame.loc is a label based indexer which yields the output you are looking for:
In[5]:
df.loc['2017-01-02']
Out[5]:
Test1 NaN
Test2 NaN
Name: 2017-01-02 00:00:00, dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With