Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to replace elements in a column using pandas

Given this data frame:

>>> a = pd.DataFrame(data={'words':['w1','w2','w3','w4','w5'],'value':np.random.rand(5)})
>>> a

     value   words
0  0.157876    w1
1  0.784586    w2
2  0.875567    w3
3  0.649377    w4
4  0.852453    w5

>>> b = pd.Series(data=['w3','w4'])
>>> b

0    w3
1    w4

I'd like to replace the elements of value with zero but only for the words that match those in b. The resulting data frame should therefore look like this:

    value    words
0  0.157876    w1
1  0.784586    w2
2  0           w3
3  0           w4
4  0.852453    w5

I thought of something along these lines: a.value[a.words==b] = 0 but it's obviously wrong.

like image 226
HappyPy Avatar asked Feb 14 '23 18:02

HappyPy


1 Answers

You're close, just use pandas.Series.isin() instead of ==:

>>> a.value[a['words'].isin(b)] = 0
>>> a
      value words
0  0.340138    w1
1  0.533770    w2
2  0.000000    w3
3  0.000000    w4
4  0.002314    w5

Or you can use ix selector:

>>> a.ix[a['words'].isin(b), 'value'] = 0
>>> a
      value words
0  0.340138    w1
1  0.533770    w2
2  0.000000    w3
3  0.000000    w4
4  0.002314    w5

update You can see documentation about differences betweed .ix and .loc, some quotes:

.loc is strictly label based, will raise KeyError when the items are not found ...

.iloc is strictly integer position based (from 0 to length-1 of the axis), will raise IndexError when the requested indicies are out of bounds ...

.ix supports mixed integer and label based access. It is primarily label based, but will fallback to integer positional access. .ix is the most general and will support any of the inputs to .loc and .iloc, as well as support for floating point label schemes. .ix is especially useful when dealing with mixed positional and label based hierarchial indexes ...

like image 146
Roman Pekar Avatar answered Mar 03 '23 00:03

Roman Pekar