Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Factorize values across dataframe columns with consistent mappings

How can I use pandas factorize with values that exist across two columns?

Specifically, I am trying to convert values that exist in two columns to numeric values, and put the corresponding factorized values into new columns, such that the factorization is consistent with the two input columns 'A' and 'B'.

Existing DataFrame:

     A   B
0    a   b
1    c   a
2    d   a
3    e   c
4    c   b
5    b   e
6    e   f

Desired Output:

     A   B   A_ID  B_ID 
0    a   b     0     4
1    c   a     1     0
2    d   a     2     0
3    e   c     3     1
4    c   b     1     4
5    b   e     4     3
6    e   f     3     5

I am able to use factorize successfully for one column using:

df['A_ID'] = pd.factorize(df.A)[0]

How could I accomplish this with consistent mappings for values across two columns? Do I need to resort to using a custom lambda function instead, or is there a way to accomplish this with factorize?

like image 340
Gabe F. Avatar asked Oct 16 '17 02:10

Gabe F.


People also ask

What does the PD factorize () function do?

factorize() method helps to get the numeric representation of an array by identifying distinct values.

How do I traverse a column in a data frame?

You can use the for loop to iterate over columns of a DataFrame. You can use multiple methods to iterate over a pandas DataFrame like iteritems() , getitem([]) , transpose(). iterrows() , enumerate() and NumPy. asarray() function.


1 Answers

pd.factorize, apply + pd.Categorical:

_, b = pd.factorize(df.values.T.reshape(-1, ))  
                           # or df.values.ravel('F'), as suggested by Zero
r = df.apply(lambda x: pd.Categorical(x, b).codes).add_suffix('_ID')

   A_ID  B_ID
0     0     4
1     1     0
2     2     0
3     3     1
4     1     4
5     4     3
6     3     5

pd.concat([df, r], 1)

   A  B  A_ID  B_ID
0  a  b     0     4
1  c  a     1     0
2  d  a     2     0
3  e  c     3     1
4  c  b     1     4
5  b  e     4     3
6  e  f     3     5
like image 80
cs95 Avatar answered Nov 14 '22 14:11

cs95