I have applied a LabelEncoder() on a dataframe, which returns the following:
The order/new_carts have different label-encoded numbers, like 70, 64, 71, etc
Is this inconsistent labeling, or did I do something wrong somewhere?
LabelEncoder works on one-dimensional arrays. If you apply it to multiple columns, it will be consistent within columns but not across columns.
As a workaround, you can convert the dataframe to a one dimensional array and call LabelEncoder on that array.
Assume this is the dataframe:
df
Out[372]: 
   0  1  2
0  d  d  a
1  c  a  c
2  c  c  b
3  e  e  d
4  d  d  e
5  d  b  e
6  e  e  b
7  a  e  b
8  b  c  c
9  e  a  b
With ravel and then reshaping:
pd.DataFrame(LabelEncoder().fit_transform(df.values.ravel()).reshape(df.shape), columns = df.columns)
Out[373]: 
   0  1  2
0  3  3  0
1  2  0  2
2  2  2  1
3  4  4  3
4  3  3  4
5  3  1  4
6  4  4  1
7  0  4  1
8  1  2  2
9  4  0  1
Edit:
If you want to store the labels, you need to save the LabelEncoder object.
le = LabelEncoder()
df2 = pd.DataFrame(le.fit_transform(df.values.ravel()).reshape(df.shape), columns = df.columns)
Now, le.classes_ gives you the classes (starting from 0).
le.classes_
Out[390]: array(['a', 'b', 'c', 'd', 'e'], dtype=object)
If you want to access the integer by label, you can construct a dict:
dict(zip(le.classes_, np.arange(len(le.classes_))))
Out[388]: {'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4}
You can do the same with transform method, without building a dict:
le.transform('c')
Out[395]: 2
                        Because of the way the apply and fit_transform functions work, you are accidentally calling the fit function on each column of your frame. Let's walk through whats happening in the following line:
labeled_df = String_df.apply(LabelEncoder().fit_transform)
LabelEncoder objectapply passing in the fit_transform method. For each column in your DataFrame it will call fit_transform on your encoder passing in the column as an argument. This does two things:The codes will not be consistent across columns because each time you call fit_transform the LabelEncoder object can choose new transformation codes.
Then pass the transform function to your apply function, instead of the fit_transform function. You can try the following:
encoder = LabelEncoder()
all_values = String_df.values.ravel() #convert the dataframe to one long array
encoder.fit(all_values)
labeled_df = String_df.apply(encoder.transform)
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With