Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select and modify a slice in pandas dataframe by integer index

I have a dataframe like the following:

df = pd.DataFrame([[1,2],[10,20],[10,2],[1,40]],columns = ['a','b'])
    a   b
0   1   2
1   10  20
2   10  2
3   1   40

I want to select the b column where a == 1, the following is a classic selecting:

df[df.a == 1].b
    a   b
0   1   2
3   1   40

Then I want to select the ith row of this subdataframe, which isn't the row with index i. There again are several ways, like the following:

df[df.a == 1].b.iloc[[1]]
Output: 
3    40
Name: b, dtype: int64

So far so good. The problem is when I try to modify the value I got there, indeed this selection method yields a copy of the slice of the dataframe, not the object itself. Therefore I can't modify it inplace.

test[test.a == 1].b.iloc[[1]] = 3
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

I don't know in which part the 'copy' problem lies, since the two following yield the same problem:

test.iloc[[3]].b = 3
test[test.a == 1].b = 3

So my question is this one: how can I change a value by both a mask selection (conditionally on the a column value) and a row selection (by the rank of the row in the subdataframe, not its index value)?

like image 297
ysearka Avatar asked Jun 21 '17 07:06

ysearka


People also ask

How do I change a specific index in Pandas?

To change the index values we need to use the set_index method which is available in pandas allows specifying the indexes. where, inplace parameter accepts True or False, which specifies that change in index is permanent or temporary. True indicates that change is Permanent.

How do I slice data in Pandas DataFrame?

To slice the columns, the syntax is df. loc[:,start:stop:step] ; where start is the name of the first column to take, stop is the name of the last column to take, and step as the number of indices to advance after each extraction; for example, you can select alternate columns.

How do I edit an index in a DataFrame?

To reset the index in pandas, you simply need to chain the function . reset_index() with the dataframe object. On applying the . reset_index() function, the index gets shifted to the dataframe as a separate column.


1 Answers

Use loc with the boolean mask and directly pass the index up:

In[178]:
df.loc[df.loc[df['a'] == 1,'b'].index[1], 'b'] = 3
df

Out[178]: 
    a   b
0   1   2
1  10  20
2  10   2
3   1   3

So here we mask the df using df['a'] == 1, this returns a boolean array and we mask the df and select just column 'b':

In[179]:
df.loc[df['a'] == 1,'b']

Out[179]: 
0    2
3    40
Name: b, dtype: int64

then just subscript the index directly:

In[180]:
df.loc[df['a'] == 1,'b'].index[1]

Out[180]: 3

We can then pass this index label back up to the top-level loc.

This test[test.a == 1].b.iloc[[1]] = 3 is chained indexing which is why the warning is raised.

like image 189
EdChum Avatar answered Oct 14 '22 03:10

EdChum