Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace a value in a pandas dataframe with column name based on a condition?

I have a dataframe that looks something like this:

enter image description here

I want to replace all 1's in the range A:D with the name of the column, so that the final result should resemble:

enter image description here

How can I do that?

You can recreate my dataframe with this:

dfz = pd.DataFrame({'A' : [1,0,0,1,0,0],
                    'B' : [1,0,0,1,0,1],
                    'C' : [1,0,0,1,3,1],
                    'D' : [1,0,0,1,0,0],
                    'E' : [22.0,15.0,None,10.,None,557.0]})
like image 497
samthebrand Avatar asked May 04 '16 15:05

samthebrand


People also ask

How do you replace a value in a DataFrame using an index?

You can easily replace a value in pandas data frames by just specifying its column and its index. Having the dataframe above, we will replace some of its values. We are using the loc function of pandas. The first variable is the index of the value we want to replace and the second is its column.

How do I rename a value in a column in pandas?

We can use pandas DataFrame rename() function to rename columns and indexes.


2 Answers

A solution using where:

>>> dfz.where(dfz != 1, dfz.columns.to_series(), axis=1)
   A  B  C  D      E
0  A  B  C  D   22.0
1  0  0  0  0   15.0
2  0  0  0  0    NaN
3  A  B  C  D   10.0
4  0  0  3  0    NaN
5  0  B  C  0  557.0
like image 125
edvardlindelof Avatar answered Oct 01 '22 13:10

edvardlindelof


One way could be to use replace and pass in a Series mapping column labels to values (those same labels in this case):

>>> dfz.loc[:, 'A':'D'].replace(1, pd.Series(dfz.columns, dfz.columns))
   A  B  C  D
0  A  B  C  D
1  0  0  0  0
2  0  0  0  0
3  A  B  C  D
4  0  0  3  0
5  0  B  C  0

To make the change permanent, you'd assign the returned DataFrame back to dfz.loc[:, 'A':'D'].

Solutions aside, it's useful to keep in mind that you may lose a lot of performance benefits when you mix numeric and string types in columns, as pandas is forced to use the generic 'object' dtype to hold the values.

like image 30
Alex Riley Avatar answered Oct 01 '22 13:10

Alex Riley