Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas indexing by both boolean `loc` and subsequent `iloc`

Tags:

python

pandas

I want to index a Pandas dataframe using a boolean mask, then set a value in a subset of the filtered dataframe based on an integer index, and have this value reflected in the dataframe. That is, I would be happy if this worked on a view of the dataframe.

Example:

In [293]:

df = pd.DataFrame({'a': [0, 1, 2, 3, 4, 5, 6, 7],
                   'b': [5, 5, 2, 2, 5, 5, 2, 2],
                   'c': [0, 0, 0, 0, 0, 0, 0, 0]})

mask = (df['a'] < 7) & (df['b'] == 2)
df.loc[mask, 'c']

Out[293]:
2    0
3    0
6    0
Name: c, dtype: int64

Now I would like to set the values of the first two elements returned in the filtered dataframe. Chaining an iloc onto the loc call above works to index:

In [294]:

df.loc[mask, 'c'].iloc[0: 2]

Out[294]:

2    0
3    0
Name: c, dtype: int64

But not to assign:

In [295]:

df.loc[mask, 'c'].iloc[0: 2] = 1

print(df)

   a  b  c
0  0  5  0
1  1  5  0
2  2  2  0
3  3  2  0
4  4  5  0
5  5  5  0
6  6  2  0
7  7  2  0

Making the assign value the same length as the slice (i.e. = [1, 1]) also doesn't work. Is there a way to assign these values?

like image 947
tsawallis Avatar asked Apr 13 '15 14:04

tsawallis


People also ask

Can I use ILOC and loc together?

In this case, loc and iloc are interchangeable when selecting via a single value or a list of values. Note that loc and iloc will return different results when selecting via slice and conditions.

Which indexing is possible in pandas DataFrame a ILOC [] b loc C boolean indexing?

Boolean indexing helps us to select the data from the DataFrames using a boolean vector. We need a DataFrame with a boolean index to use the boolean indexing.

Are loc and ILOC methods in pandas?

loc() and iloc() are one of those methods. These are used in slicing data from the Pandas DataFrame. They help in the convenient selection of data from the DataFrame in Python. They are used in filtering the data according to some conditions.


3 Answers

This does work but is a little ugly, basically we use the index generated from the mask and make an additional call to loc:

In [57]:

df.loc[df.loc[mask,'c'].iloc[0:2].index, 'c'] = 1
df
Out[57]:
   a  b  c
0  0  5  0
1  1  5  0
2  2  2  1
3  3  2  1
4  4  5  0
5  5  5  0
6  6  2  0
7  7  2  0

So breaking the above down:

In [60]:
# take the index from the mask and iloc
df.loc[mask, 'c'].iloc[0: 2]
Out[60]:
2    0
3    0
Name: c, dtype: int64
In [61]:
# call loc using this index, we can now use this to select column 'c' and set the value
df.loc[df.loc[mask,'c'].iloc[0:2].index]
Out[61]:
   a  b  c
2  2  2  0
3  3  2  0
like image 191
EdChum Avatar answered Oct 22 '22 15:10

EdChum


How about.

ix = df.index[mask][:2]
df.loc[ix, 'c'] = 1

Same idea as EdChum but more elegant as suggested in the comment.

EDIT: Have to be a little bit careful with this one as it may give unwanted results with a non-unique index, since there could be multiple rows indexed by either of the label in ix above. If the index is non-unique and you only want the first 2 (or n) rows that satisfy the boolean key, it would be safer to use .iloc with integer indexing with something like

ix = np.where(mask)[0][:2]
df.iloc[ix, 'c'] = 1
like image 37
JoeCondron Avatar answered Oct 22 '22 13:10

JoeCondron


I don't know if this is any more elegant, but it's a little different:

mask = mask & (mask.cumsum() < 3)

df.loc[mask, 'c'] = 1

   a  b  c
0  0  5  0
1  1  5  0
2  2  2  1
3  3  2  1
4  4  5  0
5  5  5  0
6  6  2  0
7  7  2  0
like image 2
JohnE Avatar answered Oct 22 '22 15:10

JohnE