I want to divide all values in certain columns matching a regex expression by some value and still have the complete dataframe.
As can be found here: How to select columns from dataframe by regex , e.g. all columns starting with d can be selected with:
df.filter(regex=("d.*"))
Now I have the columns selected I need, I want e.g. divide the values by 2. Which is possible with the following code:
df.filter(regex=("d.*")).divide(2)
However if I try to update my dataframe like this, it gives a can't assign to function call
:
df.filter(regex=("d.*")) = df.filter(regex=("d.*")).divide(2)
How to properly update my existing df?
The following technique is not limited to use with filter and can be applied far more generally.
Setup
I'll use @cᴏʟᴅsᴘᴇᴇᴅ setup
Let df
be:
d1 d2 abc
0 5 1 8
1 13 8 6
2 9 4 7
3 9 16 15
4 1 20 9
Inplace update
Use pd.DataFrame.update
update
will take the argument dataframe and alter the calling dataframe where index and column values match the argument.
df.update(df.filter(regex='d.*') / 3)
df
d1 d2 abc
0 1.666667 0.333333 8
1 4.333333 2.666667 6
2 3.000000 1.333333 7
3 3.000000 5.333333 15
4 0.333333 6.666667 9
Inline copy
Use pd.DataFrame.assign
I use the double splat **
to unpack the argument dataframe into a dictionary where column names are keys and the series that are the columns are the values. This matches the required signature for assign
and overwrites those columns in the copy that is produced. In short, this is a copy of the calling dataframe with the columns overwritten appropriately.
df.assign(**df.filter(regex='d.*').div(3))
d1 d2 abc
0 1.666667 0.333333 8
1 4.333333 2.666667 6
2 3.000000 1.333333 7
3 3.000000 5.333333 15
4 0.333333 6.666667 9
Use df.columns.str.startswith
.
c = df.columns.str.startswith('d')
df.loc[:, c] /= 2
As an example, consider -
df
d1 d2 abc
0 5 1 8
1 13 8 6
2 9 4 7
3 9 16 15
4 1 20 9
c = df.columns.str.startswith('d')
c
array([ True, True, False], dtype=bool)
df.loc[:, c] /= 3 # 3 instead of 2, just for example
df
d1 d2 abc
0 1.666667 0.333333 8
1 4.333333 2.666667 6
2 3.000000 1.333333 7
3 3.000000 5.333333 15
4 0.333333 6.666667 9
If you need to pass a regex, use str.contains
-
c = df.columns.str.contains(p) # p => your pattern
And the rest of your code follows.
I think you need extract columns names and assign:
df[df.filter(regex=("d.*")).columns] = df.filter(regex=("d.*")).divide(2)
Or:
cols = df.columns[df.columns.str.contains('^d.*')]
df[cols] /=2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With