Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas select columns with regex and divide by value

I want to divide all values in certain columns matching a regex expression by some value and still have the complete dataframe.

As can be found here: How to select columns from dataframe by regex , e.g. all columns starting with d can be selected with:

df.filter(regex=("d.*"))

Now I have the columns selected I need, I want e.g. divide the values by 2. Which is possible with the following code:

df.filter(regex=("d.*")).divide(2)

However if I try to update my dataframe like this, it gives a can't assign to function call:

df.filter(regex=("d.*")) = df.filter(regex=("d.*")).divide(2)

How to properly update my existing df?

like image 817
NumesSanguis Avatar asked Jan 15 '18 07:01

NumesSanguis


3 Answers

The following technique is not limited to use with filter and can be applied far more generally.

Setup
I'll use @cᴏʟᴅsᴘᴇᴇᴅ setup
Let df be:

   d1  d2  abc
0   5   1    8
1  13   8    6
2   9   4    7
3   9  16   15
4   1  20    9

Inplace update
Use pd.DataFrame.update
update will take the argument dataframe and alter the calling dataframe where index and column values match the argument.

df.update(df.filter(regex='d.*') / 3)
df

         d1        d2  abc
0  1.666667  0.333333    8
1  4.333333  2.666667    6
2  3.000000  1.333333    7
3  3.000000  5.333333   15
4  0.333333  6.666667    9

Inline copy
Use pd.DataFrame.assign
I use the double splat ** to unpack the argument dataframe into a dictionary where column names are keys and the series that are the columns are the values. This matches the required signature for assign and overwrites those columns in the copy that is produced. In short, this is a copy of the calling dataframe with the columns overwritten appropriately.

df.assign(**df.filter(regex='d.*').div(3))

         d1        d2  abc
0  1.666667  0.333333    8
1  4.333333  2.666667    6
2  3.000000  1.333333    7
3  3.000000  5.333333   15
4  0.333333  6.666667    9
like image 173
piRSquared Avatar answered Oct 01 '22 16:10

piRSquared


Use df.columns.str.startswith.

c = df.columns.str.startswith('d')    
df.loc[:, c] /= 2

As an example, consider -

df

   d1  d2  abc
0   5   1    8
1  13   8    6
2   9   4    7
3   9  16   15
4   1  20    9

c = df.columns.str.startswith('d')  
c
array([ True,  True, False], dtype=bool)

df.loc[:, c] /= 3    # 3 instead of 2, just for example
df

         d1        d2  abc
0  1.666667  0.333333    8
1  4.333333  2.666667    6
2  3.000000  1.333333    7
3  3.000000  5.333333   15
4  0.333333  6.666667    9

If you need to pass a regex, use str.contains -

c = df.columns.str.contains(p) # p => your pattern

And the rest of your code follows.

like image 42
cs95 Avatar answered Oct 01 '22 16:10

cs95


I think you need extract columns names and assign:

df[df.filter(regex=("d.*")).columns] = df.filter(regex=("d.*")).divide(2)

Or:

cols = df.columns[df.columns.str.contains('^d.*')]
df[cols] /=2
like image 36
jezrael Avatar answered Oct 01 '22 14:10

jezrael