I want to index a Pandas dataframe using a boolean mask, then set a value in a subset of the filtered dataframe based on an integer index, and have this value reflected in the dataframe. That is, I would be happy if this worked on a view of the dataframe.
Example:
In [293]:
df = pd.DataFrame({'a': [0, 1, 2, 3, 4, 5, 6, 7],
'b': [5, 5, 2, 2, 5, 5, 2, 2],
'c': [0, 0, 0, 0, 0, 0, 0, 0]})
mask = (df['a'] < 7) & (df['b'] == 2)
df.loc[mask, 'c']
Out[293]:
2 0
3 0
6 0
Name: c, dtype: int64
Now I would like to set the values of the first two elements returned in the filtered dataframe. Chaining an iloc
onto the loc
call above works to index:
In [294]:
df.loc[mask, 'c'].iloc[0: 2]
Out[294]:
2 0
3 0
Name: c, dtype: int64
But not to assign:
In [295]:
df.loc[mask, 'c'].iloc[0: 2] = 1
print(df)
a b c
0 0 5 0
1 1 5 0
2 2 2 0
3 3 2 0
4 4 5 0
5 5 5 0
6 6 2 0
7 7 2 0
Making the assign value the same length as the slice (i.e. = [1, 1]
) also doesn't work. Is there a way to assign these values?
In this case, loc and iloc are interchangeable when selecting via a single value or a list of values. Note that loc and iloc will return different results when selecting via slice and conditions.
Boolean indexing helps us to select the data from the DataFrames using a boolean vector. We need a DataFrame with a boolean index to use the boolean indexing.
loc() and iloc() are one of those methods. These are used in slicing data from the Pandas DataFrame. They help in the convenient selection of data from the DataFrame in Python. They are used in filtering the data according to some conditions.
This does work but is a little ugly, basically we use the index generated from the mask and make an additional call to loc
:
In [57]:
df.loc[df.loc[mask,'c'].iloc[0:2].index, 'c'] = 1
df
Out[57]:
a b c
0 0 5 0
1 1 5 0
2 2 2 1
3 3 2 1
4 4 5 0
5 5 5 0
6 6 2 0
7 7 2 0
So breaking the above down:
In [60]:
# take the index from the mask and iloc
df.loc[mask, 'c'].iloc[0: 2]
Out[60]:
2 0
3 0
Name: c, dtype: int64
In [61]:
# call loc using this index, we can now use this to select column 'c' and set the value
df.loc[df.loc[mask,'c'].iloc[0:2].index]
Out[61]:
a b c
2 2 2 0
3 3 2 0
How about.
ix = df.index[mask][:2]
df.loc[ix, 'c'] = 1
Same idea as EdChum but more elegant as suggested in the comment.
EDIT: Have to be a little bit careful with this one as it may give unwanted results with a non-unique index, since there could be multiple rows indexed by either of the label in ix
above. If the index is non-unique and you only want the first 2 (or n) rows that satisfy the boolean key, it would be safer to use .iloc
with integer indexing with something like
ix = np.where(mask)[0][:2]
df.iloc[ix, 'c'] = 1
I don't know if this is any more elegant, but it's a little different:
mask = mask & (mask.cumsum() < 3)
df.loc[mask, 'c'] = 1
a b c
0 0 5 0
1 1 5 0
2 2 2 1
3 3 2 1
4 4 5 0
5 5 5 0
6 6 2 0
7 7 2 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With