I am trying to learn and UNDERSTAND the logic behind pandas LOC and I can not come around with an explanation for the following issue. If I want to slice the dataframe with LOC I can do it in the following way
dates=pd.date_range('20130101',periods=6)
df=pd.DataFrame(np.random.rand(6,4),index=dates,columns=list('ABCD'))
print(df.loc['20130102':'20130104',['A','B']]) (*)
All fine. What I don't understand is that if in the second part of LOC I can entry a list of columns headers ['A','B'] why can't I do the same for the rows? i.e. the following would not work
print(df.loc[['20130102','20130104'],['A','B']])
What is the logic behind? I am giving LOC two row indexes, and two columns headers, but it would not work. For the columns part a list is fine, for the rows part it's not. Moreover, in the first part of the LOC (see *) the indexes of the rows can be accessed with ":" i.e. from index1 to index2 = 'index1':'index2' but I can not do the same in part 2 of the LOC function: this would not work:
print(df.loc['20130102':'20130104',['A':'C']])
I would be glad to understand why those two showed lines of codes are not working.
Thanks.
You need convert values of list to datetimes because DatetimeIndex, it means is necessary same types of values of list and values of index/columns of DataFrame, else KeyError:
print(df.loc[pd.to_datetime(['20130102','20130104']),['A','B']])
A B
2013-01-02 0.719469 0.423106
2013-01-04 0.438572 0.059678
Selecting by first and last value of index/columns
Converting to datetimes is not necessary, because partial string indexing.
For select by range remove only list [] for selecting columns:
print(df.loc['20130102':'20130104','A':'C'])
A B C
2013-01-02 0.719469 0.423106 0.980764
2013-01-03 0.480932 0.392118 0.343178
2013-01-04 0.438572 0.059678 0.398044
Similar solution for selecting of datetimes:
print(df.loc['2013-01-02':'2013-01-04','A':'C'])
A B C
2013-01-02 0.719469 0.423106 0.980764
2013-01-03 0.480932 0.392118 0.343178
2013-01-04 0.438572 0.059678 0.398044
Combinations:
#select betwen start/end datetime and only columns A,C
print(df.loc['20130102':'20130104',['A','C']])
A C
2013-01-02 0.719469 0.980764
2013-01-03 0.480932 0.343178
2013-01-04 0.438572 0.398044
#select only 20130102, 20130104 index and columns between A and C
print(df.loc[pd.to_datetime(['20130102','20130104']),'A':'C'])
A B C
2013-01-02 0.719469 0.423106 0.980764
2013-01-04 0.438572 0.059678 0.398044
This code explains the different ways to access data (slicing) with LOC in pandas:
df=pd.DataFrame(np.random.rand(6,4),index=['row1','row2','row3','row4','row5','row6'],columns=list('ABCD'))
A B C D
row1 0.972614 0.193116 0.448413 0.731300
row2 0.135391 0.783295 0.959058 0.107872
row3 0.966703 0.742793 0.852716 0.710681
row4 0.976819 0.920898 0.665329 0.078999
row5 0.418717 0.122677 0.716004 0.977522
row6 0.101422 0.641862 0.157751 0.888720
row_range column_range:
df.loc['row1':'row3', 'A':'C']
A B C
row1 0.972614 0.193116 0.448413
row2 0.135391 0.783295 0.959058
row3 0.966703 0.742793 0.852716
values_rows column_range:
df.loc[['row1','row3'], 'A':'C']
A B C
row1 0.972614 0.193116 0.448413
row3 0.966703 0.742793 0.852716
row_range column_values:
df.loc['row1':'row3', ['A','C']]
A C
row1 0.972614 0.448413
row2 0.135391 0.959058
row3 0.966703 0.852716
Single value:
df.loc['row1','A'])
0.972614309371533
conclusion: when using a range don't put it between [] But use the [] to include a list of values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With