I have a dataframe like the following:
df = pd.DataFrame([[1,2],[10,20],[10,2],[1,40]],columns = ['a','b'])
a b
0 1 2
1 10 20
2 10 2
3 1 40
I want to select the b
column where a == 1
, the following is a classic selecting:
df[df.a == 1].b
a b
0 1 2
3 1 40
Then I want to select the ith row of this subdataframe, which isn't the row with index i. There again are several ways, like the following:
df[df.a == 1].b.iloc[[1]]
Output:
3 40
Name: b, dtype: int64
So far so good. The problem is when I try to modify the value I got there, indeed this selection method yields a copy of the slice of the dataframe, not the object itself. Therefore I can't modify it inplace.
test[test.a == 1].b.iloc[[1]] = 3
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
I don't know in which part the 'copy' problem lies, since the two following yield the same problem:
test.iloc[[3]].b = 3
test[test.a == 1].b = 3
So my question is this one: how can I change a value by both a mask selection (conditionally on the a
column value) and a row selection (by the rank of the row in the subdataframe, not its index value)?
To change the index values we need to use the set_index method which is available in pandas allows specifying the indexes. where, inplace parameter accepts True or False, which specifies that change in index is permanent or temporary. True indicates that change is Permanent.
To slice the columns, the syntax is df. loc[:,start:stop:step] ; where start is the name of the first column to take, stop is the name of the last column to take, and step as the number of indices to advance after each extraction; for example, you can select alternate columns.
To reset the index in pandas, you simply need to chain the function . reset_index() with the dataframe object. On applying the . reset_index() function, the index gets shifted to the dataframe as a separate column.
Use loc
with the boolean mask and directly pass the index up:
In[178]:
df.loc[df.loc[df['a'] == 1,'b'].index[1], 'b'] = 3
df
Out[178]:
a b
0 1 2
1 10 20
2 10 2
3 1 3
So here we mask the df using df['a'] == 1
, this returns a boolean array and we mask the df and select just column 'b'
:
In[179]:
df.loc[df['a'] == 1,'b']
Out[179]:
0 2
3 40
Name: b, dtype: int64
then just subscript the index directly:
In[180]:
df.loc[df['a'] == 1,'b'].index[1]
Out[180]: 3
We can then pass this index label back up to the top-level loc
.
This test[test.a == 1].b.iloc[[1]] = 3
is chained indexing which is why the warning is raised.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With