I have applied a LabelEncoder()
on a dataframe, which returns the following:
The order/new_cart
s have different label-encoded numbers, like 70, 64, 71, etc
Is this inconsistent labeling, or did I do something wrong somewhere?
LabelEncoder works on one-dimensional arrays. If you apply it to multiple columns, it will be consistent within columns but not across columns.
As a workaround, you can convert the dataframe to a one dimensional array and call LabelEncoder on that array.
Assume this is the dataframe:
df
Out[372]:
0 1 2
0 d d a
1 c a c
2 c c b
3 e e d
4 d d e
5 d b e
6 e e b
7 a e b
8 b c c
9 e a b
With ravel and then reshaping:
pd.DataFrame(LabelEncoder().fit_transform(df.values.ravel()).reshape(df.shape), columns = df.columns)
Out[373]:
0 1 2
0 3 3 0
1 2 0 2
2 2 2 1
3 4 4 3
4 3 3 4
5 3 1 4
6 4 4 1
7 0 4 1
8 1 2 2
9 4 0 1
Edit:
If you want to store the labels, you need to save the LabelEncoder object.
le = LabelEncoder()
df2 = pd.DataFrame(le.fit_transform(df.values.ravel()).reshape(df.shape), columns = df.columns)
Now, le.classes_
gives you the classes (starting from 0).
le.classes_
Out[390]: array(['a', 'b', 'c', 'd', 'e'], dtype=object)
If you want to access the integer by label, you can construct a dict:
dict(zip(le.classes_, np.arange(len(le.classes_))))
Out[388]: {'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4}
You can do the same with transform method, without building a dict:
le.transform('c')
Out[395]: 2
Because of the way the apply and fit_transform functions work, you are accidentally calling the fit function on each column of your frame. Let's walk through whats happening in the following line:
labeled_df = String_df.apply(LabelEncoder().fit_transform)
LabelEncoder
objectapply
passing in the fit_transform
method. For each column in your DataFrame
it will call fit_transform
on your encoder passing in the column as an argument. This does two things:The codes will not be consistent across columns because each time you call fit_transform the LabelEncoder object can choose new transformation codes.
Then pass the transform function to your apply function, instead of the fit_transform function. You can try the following:
encoder = LabelEncoder()
all_values = String_df.values.ravel() #convert the dataframe to one long array
encoder.fit(all_values)
labeled_df = String_df.apply(encoder.transform)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With