I'm generating an empty dataframe with a series of dates as the index. Data will be added to the dataframe at a later point.
cbd=pd.date_range(start=pd.datetime(2017,01,02),end=pd.datetime(2017,01,30),period=1)
df = pd.DataFrame(data=None,columns=['Test1','Test2'],index=cbd)
df.head()
           Test1 Test2
2017-01-02   NaN   NaN
2017-01-03   NaN   NaN
2017-01-04   NaN   NaN
2017-01-05   NaN   NaN
2017-01-06   NaN   NaN
A few slicing methods don't seem to work. The following returns a KeyError:
df['2017-01-02']
However any of the following work:
df['2017-01-02':'2017-01-02']
df.loc['2017-01-02']
What am I missing here? Why doesn't the first slice return a result?
Pandas has a built-in function called to_datetime()that converts date and time in string format to a DateTime object. As you can see, the 'date' column in the DataFrame is currently of a string-type object. Thus, to_datetime() converts the column to a series of the appropriate datetime64 dtype.
[] in df[]
: inside [], then the value(s) inside it will be considered as column(s).: inside [], then the value(s) inside it will be considered as row(s).Because most of the time people want to slice the rows instead of slicing the columns.
So they decided that x and y in df[x:y] should correspond to rows,
and x in d[x] or x, y in df[[x,y]] should correspond to column(s).
df = pd.DataFrame(data = [[1,2,3], [1,2,3], [1,2,3]],
                                 index = ['A','B','C'], columns = ['A','B','C'])
print df
Output:
   A  B  C
A  1  2  3
B  1  2  3
C  1  2  3
Now when you do df['B'], it can mean 2 things:
Take the 2nd index B and give you the 2nd row 1 2 3
                 OR
Take the 2nd column B and give you the 2nd column 2 2 2.
So in order to resolve this conflict and keep it unambiguous df['B'] will always mean that you want the column 'B', if there is no such column then it will throw an Error.
df['2017-01-02'] fails?It will search for a column '2017-01-02', Because there is no such column, it throws an error.
df.loc['2017-01-02'] works then?Because .loc[] has  syntax of df.loc[row,column] and you can leave out the column if you will, as in your case, it simply means df.loc[row]
There is difference, because use different approaches:
For select one row is necessary loc:
df['2017-01-02']
Docs - partial string indexing:
Warning
The following selection will raise a KeyError; otherwise this selection methodology would be inconsistent with other selection methods in pandas (as this is not a slice, nor does it resolve to one):
dft['2013-1-15 12:30:00']
To select a single row, use .loc
In [74]: dft.loc['2013-1-15 12:30:00']
Out[74]: 
A    0.193284
Name: 2013-01-15 12:30:00, dtype: float64
df['2017-01-02':'2017-01-02']
This is pure partial string indexing:
This type of slicing will work on a
DataFramewith a DateTimeIndex as well. Since the partial string selection is a form of label slicing, the endpoints will be included. This would include matching times on an included date.
First I have updated your test data (just for info) as it returns an 'invalid token' error. Please see changes here:
cbd=pd.date_range(start='2017-01-02',end='2017-01-30',period=1)
df = pd.DataFrame(data=None,columns=['Test1','Test2'],index=cbd)
Now looking at the first row:
In[1]:
df.head(1)
Out[1]:
          Test1 Test2
2017-01-02  NaN NaN
And then trying the initial slicing approach yields this error:
In[2]:    
df['2017-01-02']
Out[2]:
KeyError: '2017-01-02'
Now try this using the column name:
In[3]:    
df.columns
Out[3]:
Index(['Test1', 'Test2'], dtype='object')
In[4]:
We try 'Test1':
df['Test1']
And get the NaN output from this column.  
Out[4]:
2017-01-02    NaN
2017-01-03    NaN
2017-01-04    NaN
2017-01-05    NaN
So the format you are using is designed to be used on the column name unless you use this format df['2017-01-02':'2017-01-02'].
The Pandas docs state "The following selection will raise a KeyError; otherwise this selection methodology would be inconsistent with other selection methods in pandas (as this is not a slice, nor does it resolve to one)".
So as you correctly identified, DataFrame.loc is a label based indexer which yields the output you are looking for:
 In[5]:
df.loc['2017-01-02']
 Out[5]:
Test1    NaN
Test2    NaN
Name: 2017-01-02 00:00:00, dtype: object
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With