How can I use pandas factorize
with values that exist across two columns?
Specifically, I am trying to convert values that exist in two columns to numeric values, and put the corresponding factorized values into new columns, such that the factorization is consistent with the two input columns 'A' and 'B'.
Existing DataFrame:
A B
0 a b
1 c a
2 d a
3 e c
4 c b
5 b e
6 e f
Desired Output:
A B A_ID B_ID
0 a b 0 4
1 c a 1 0
2 d a 2 0
3 e c 3 1
4 c b 1 4
5 b e 4 3
6 e f 3 5
I am able to use factorize
successfully for one column using:
df['A_ID'] = pd.factorize(df.A)[0]
How could I accomplish this with consistent mappings for values across two columns? Do I need to resort to using a custom lambda
function instead, or is there a way to accomplish this with factorize
?
factorize() method helps to get the numeric representation of an array by identifying distinct values.
You can use the for loop to iterate over columns of a DataFrame. You can use multiple methods to iterate over a pandas DataFrame like iteritems() , getitem([]) , transpose(). iterrows() , enumerate() and NumPy. asarray() function.
pd.factorize
, apply
+ pd.Categorical
:
_, b = pd.factorize(df.values.T.reshape(-1, ))
# or df.values.ravel('F'), as suggested by Zero
r = df.apply(lambda x: pd.Categorical(x, b).codes).add_suffix('_ID')
A_ID B_ID
0 0 4
1 1 0
2 2 0
3 3 1
4 1 4
5 4 3
6 3 5
pd.concat([df, r], 1)
A B A_ID B_ID
0 a b 0 4
1 c a 1 0
2 d a 2 0
3 e c 3 1
4 c b 1 4
5 b e 4 3
6 e f 3 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With