Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is pd.unique() faster than np.unique()?

I tried to compare the two, one is pandas.unique() and another one is numpy.unique(), and I found out that the latter actually surpass the first one.
I am not sure whether the excellency is linear or not.

Can anyone please tell me why such a difference exists, with regards to the code implementation? In what case should I use which?

like image 981
Songcheng Li Avatar asked Oct 29 '22 00:10

Songcheng Li


1 Answers

np.unique() is treating the data as an array, so it goes through every value individually then identifies the unique fields.

whereas, pandas has pre-built metadata which contains this information and pd.unique() is simply calling on the metadata which contains 'unique' info, so it doesn't have to calculate it again.

like image 185
Dylan McCullough Avatar answered Nov 15 '22 07:11

Dylan McCullough