Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas get string label from factorized dataframe

I factorized my pandas dataframe column but overwrote the original column value.

Is there any way to get the original mapping values back for reference?

Example:

df_test = pd.DataFrame({'col1': pd.Series(['cat','dog','cat','mouse'])})
df_test['col1'] = pd.factorize(df_test['col1'])[0]
df_test

enter image description here

however i want to be able to call the below again to check what the integers map to. Is there any way to check the mapping without re-initializing the dataframe?

pd.factorize(df_test)[1]
like image 689
jxn Avatar asked May 25 '26 05:05

jxn


1 Answers

I'd suggest you slightly different approach - use categorical dtype:

In [40]: df_test['col1'] = df_test['col1'].astype('category')

In [41]: df_test
Out[41]:
    col1
0    cat
1    dog
2    cat
3  mouse

In [42]: df_test.dtypes
Out[42]:
col1    category
dtype: object

and if you need numbers:

In [44]: df_test['col1'].cat.codes
Out[44]:
0    0
1    1
2    0
3    2
dtype: int8

Memory usage for 400K DataFrame:

In [74]: df_test = pd.DataFrame({'col1': pd.Series(['cat','dog','cat','mouse'])})

In [75]: df_test = pd.concat([df_test] * 10**5, ignore_index=True)

In [76]: df_test.shape
Out[76]: (400000, 1)

In [77]: d1 = df_test.copy()

In [78]: d2 = df_test.copy()

In [79]: d1.col1 = pd.factorize(d1.col1)[0]

In [80]: d2.col1 = d2.col1.astype('category')

In [81]: df_test.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 400000 entries, 0 to 399999
Data columns (total 1 columns):
col1    400000 non-null object
dtypes: object(1)
memory usage: 3.1+ MB

In [82]: d1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 400000 entries, 0 to 399999
Data columns (total 1 columns):
col1    400000 non-null int64
dtypes: int64(1)
memory usage: 3.1 MB

In [83]: d2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 400000 entries, 0 to 399999
Data columns (total 1 columns):
col1    400000 non-null category
dtypes: category(1)
memory usage: 390.7 KB           # categorical column takes almost 8x times less memory
like image 171
MaxU - stop WAR against UA Avatar answered May 27 '26 17:05

MaxU - stop WAR against UA



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!