I tried to compare the two, one is pandas.unique()
and another one is numpy.unique()
, and I found out that the latter actually surpass the first one.
I am not sure whether the excellency is linear or not.
Can anyone please tell me why such a difference exists, with regards to the code implementation? In what case should I use which?
np.unique() is treating the data as an array, so it goes through every value individually then identifies the unique fields.
whereas, pandas has pre-built metadata which contains this information and pd.unique() is simply calling on the metadata which contains 'unique' info, so it doesn't have to calculate it again.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With