Pandas indexing by both boolean `loc` and subsequent `iloc`

Tags:

I want to index a Pandas dataframe using a boolean mask, then set a value in a subset of the filtered dataframe based on an integer index, and have this value reflected in the dataframe. That is, I would be happy if this worked on a view of the dataframe.

Example:

In [293]:

df = pd.DataFrame({'a': [0, 1, 2, 3, 4, 5, 6, 7],
                   'b': [5, 5, 2, 2, 5, 5, 2, 2],
                   'c': [0, 0, 0, 0, 0, 0, 0, 0]})

mask = (df['a'] < 7) & (df['b'] == 2)
df.loc[mask, 'c']

Out[293]:
2    0
3    0
6    0
Name: c, dtype: int64

Now I would like to set the values of the first two elements returned in the filtered dataframe. Chaining an iloc onto the loc call above works to index:

In [294]:

df.loc[mask, 'c'].iloc[0: 2]

Out[294]:

2    0
3    0
Name: c, dtype: int64

But not to assign:

In [295]:

df.loc[mask, 'c'].iloc[0: 2] = 1

print(df)

   a  b  c
0  0  5  0
1  1  5  0
2  2  2  0
3  3  2  0
4  4  5  0
5  5  5  0
6  6  2  0
7  7  2  0

Making the assign value the same length as the slice (i.e. = [1, 1]) also doesn't work. Is there a way to assign these values?

947

asked Apr 13 '15 14:04

tsawallis

3 Answers

This does work but is a little ugly, basically we use the index generated from the mask and make an additional call to loc:

In [57]:

df.loc[df.loc[mask,'c'].iloc[0:2].index, 'c'] = 1
df
Out[57]:
   a  b  c
0  0  5  0
1  1  5  0
2  2  2  1
3  3  2  1
4  4  5  0
5  5  5  0
6  6  2  0
7  7  2  0

So breaking the above down:

In [60]:
# take the index from the mask and iloc
df.loc[mask, 'c'].iloc[0: 2]
Out[60]:
2    0
3    0
Name: c, dtype: int64
In [61]:
# call loc using this index, we can now use this to select column 'c' and set the value
df.loc[df.loc[mask,'c'].iloc[0:2].index]
Out[61]:
   a  b  c
2  2  2  0
3  3  2  0

191

answered Oct 22 '22 15:10

EdChum

How about.

ix = df.index[mask][:2]
df.loc[ix, 'c'] = 1

Same idea as EdChum but more elegant as suggested in the comment.

EDIT: Have to be a little bit careful with this one as it may give unwanted results with a non-unique index, since there could be multiple rows indexed by either of the label in ix above. If the index is non-unique and you only want the first 2 (or n) rows that satisfy the boolean key, it would be safer to use .iloc with integer indexing with something like

ix = np.where(mask)[0][:2]
df.iloc[ix, 'c'] = 1

answered Oct 22 '22 13:10

JoeCondron

I don't know if this is any more elegant, but it's a little different:

mask = mask & (mask.cumsum() < 3)

df.loc[mask, 'c'] = 1

   a  b  c
0  0  5  0
1  1  5  0
2  2  2  1
3  3  2  1
4  4  5  0
5  5  5  0
6  6  2  0
7  7  2  0

answered Oct 22 '22 15:10

JohnE

Related questions
                            
                                Using bokeh to plot on top of a map
                            
                                How to replace all \W (none letters) with exception of '-' (dash) with regular expression?
                            
                                {DetachedInstanceError} Parent instance <Car> is not bound to a session; lazy load operation of attribute 'owner' cannot proceed
                            
                                Install OpenCV 3.0 with extra modules (sift, surf...) for python
                            
                                How to wrap a C pointer and length in a new-style buffer object in Cython?
                            
                                Python web scraping for javascript generated content
                            
                                Django management command and argparse
                            
                                How to find the sum of the lengths of a list in a dictionary of dictionaries?
                            
                                How to change the value of activate_url in django allauth?
                            
                                Plotting PMF neatly in python
                            
                                How do I use python variable in a javascript?
                            
                                Disable cache on a specific page using Flask
                            
                                Select specific index, column pairs from pandas dataframe
                            
                                Serving static html file from another directory from flask restful endpoint
                            
                                Does the len() built-in function iterates through the collection to calculate its length, or does it access a collection's attribute? [duplicate]
                            
                                How can I raise an Exception that includes a Unicode string?
                            
                                Read (SVHN) Dataset in python
                            
                                index 0 is out of bounds for axis 0 with size 0
                            
                                Creating a processing queue in Tornado
                            
                                matplotlib 3D scatterplot with marker color corresponding to RGB values

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas indexing by both boolean `loc` and subsequent `iloc`

Tags:

python

pandas