Pandas select columns with regex and divide by value

Question

I want to divide all values in certain columns matching a regex expression by some value and still have the complete dataframe.

As can be found here: How to select columns from dataframe by regex , e.g. all columns starting with d can be selected with:

df.filter(regex=("d.*"))

Now I have the columns selected I need, I want e.g. divide the values by 2. Which is possible with the following code:

df.filter(regex=("d.*")).divide(2)

However if I try to update my dataframe like this, it gives a can't assign to function call:

df.filter(regex=("d.*")) = df.filter(regex=("d.*")).divide(2)

How to properly update my existing df?

piRSquared · Accepted Answer

The following technique is not limited to use with filter and can be applied far more generally.

Setup
I'll use @cᴏʟᴅsᴘᴇᴇᴅ setup
Let df be:

   d1  d2  abc
0   5   1    8
1  13   8    6
2   9   4    7
3   9  16   15
4   1  20    9

Inplace update
Use pd.DataFrame.update
update will take the argument dataframe and alter the calling dataframe where index and column values match the argument.

df.update(df.filter(regex='d.*') / 3)
df

         d1        d2  abc
0  1.666667  0.333333    8
1  4.333333  2.666667    6
2  3.000000  1.333333    7
3  3.000000  5.333333   15
4  0.333333  6.666667    9

Inline copy
Use pd.DataFrame.assign
I use the double splat ** to unpack the argument dataframe into a dictionary where column names are keys and the series that are the columns are the values. This matches the required signature for assign and overwrites those columns in the copy that is produced. In short, this is a copy of the calling dataframe with the columns overwritten appropriately.

df.assign(**df.filter(regex='d.*').div(3))

         d1        d2  abc
0  1.666667  0.333333    8
1  4.333333  2.666667    6
2  3.000000  1.333333    7
3  3.000000  5.333333   15
4  0.333333  6.666667    9

cs95 · Answer

Use df.columns.str.startswith.

c = df.columns.str.startswith('d')    
df.loc[:, c] /= 2

As an example, consider -

df

   d1  d2  abc
0   5   1    8
1  13   8    6
2   9   4    7
3   9  16   15
4   1  20    9

c = df.columns.str.startswith('d')  
c
array([ True,  True, False], dtype=bool)

df.loc[:, c] /= 3    # 3 instead of 2, just for example
df

         d1        d2  abc
0  1.666667  0.333333    8
1  4.333333  2.666667    6
2  3.000000  1.333333    7
3  3.000000  5.333333   15
4  0.333333  6.666667    9

If you need to pass a regex, use str.contains -

c = df.columns.str.contains(p) # p => your pattern

And the rest of your code follows.

jezrael · Answer

I think you need extract columns names and assign:

df[df.filter(regex=("d.*")).columns] = df.filter(regex=("d.*")).divide(2)

Or:

cols = df.columns[df.columns.str.contains('^d.*')]
df[cols] /=2

Pandas select columns with regex and divide by value

Tags:

python

regex

pandas

NumesSanguis

3 Answers

piRSquared

cs95

jezrael

Recent Activity

Donate For Us

Pandas select columns with regex and divide by value

Tags:

python

regex

pandas

NumesSanguis

3 Answers

piRSquared

cs95

jezrael

Related questions

Recent Activity

Donate For Us