Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does python list(set(a)) change its order every time?

Tags:

python

list

set

I have a list of 5 million string elements, which are stored as a pickle object.

a = ['https://en.wikipedia.org/wiki/Data_structure','https://en.wikipedia.org/wiki/Data_mining','https://en.wikipedia.org/wiki/Statistical_learning_theory','https://en.wikipedia.org/wiki/Machine_learning','https://en.wikipedia.org/wiki/Computer_science','https://en.wikipedia.org/wiki/Information_theory','https://en.wikipedia.org/wiki/Statistics','https://en.wikipedia.org/wiki/Mathematics','https://en.wikipedia.org/wiki/Signal_processing','https://en.wikipedia.org/wiki/Sorting_algorithm','https://en.wikipedia.org/wiki/Data_structure','https://en.wikipedia.org/wiki/Quicksort','https://en.wikipedia.org/wiki/Merge_sort','https://en.wikipedia.org/wiki/Heapsort','https://en.wikipedia.org/wiki/Insertion_sort','https://en.wikipedia.org/wiki/Introsort','https://en.wikipedia.org/wiki/Selection_sort','https://en.wikipedia.org/wiki/Timsort','https://en.wikipedia.org/wiki/Cubesort','https://en.wikipedia.org/wiki/Shellsort']

To remove duplicates, I use set(a), then I made it a list again through list(set(a)).

My question is:

Even if I restart python, and read the list from the pickle file, will the order of list(set(a)) be the same every time?

I'm eager to know how this hash -> list ordering works.


I tested with a small dataset and it seems to have a consistent ordering.

In [50]: a = ['x','y','z','k']

In [51]: a
['x', 'y', 'z', 'k']

In [52]: list(set(a))
['y', 'x', 'k', 'z']

In [53]: b=list(set(a))

In [54]: list(set(b))
['y', 'x', 'k', 'z']

In [55]: del b

In [56]: b=list(set(a))

In [57]: b
['y', 'x', 'k', 'z']
like image 868
aerin Avatar asked May 04 '16 20:05

aerin


People also ask

Does set change order of list Python?

Unlike in a standard set, the order of the data in an ordered set is preserved. We used ordered sets when we needed the order in which we entered the data to be maintained over the course of the program. In an ordered set, looking at the data does not change its order as it would in an unordered set.

Does Python list maintain order?

Lists Are Ordered The order in which you specify the elements when you define a list is an innate characteristic of that list and is maintained for that list's lifetime. (You will see a Python data type that is not ordered in the next tutorial on dictionaries.)

Does Python set guarantee order?

The answer is simply a NO.

Is set in Python always sorted?

sort() established the convention that sort() sorts the object in place, but a set cannot be sorted in place because sets are unordered.


1 Answers

I would suggest an auxiliary set() to ensure unicity when adding items on the list, thus preserving the order of your list(), and not storing the set() per se.

First, load your list and create a set with the contents Before adding items to your list, check that they are not in the set (much faster search using "in" from the set rather than the list, specially if there are many elements) Pickle your list, the order will be exactly the one you want

Drawback: takes twice as much memory than handling only a set()

like image 133
Jean-François Fabre Avatar answered Sep 24 '22 02:09

Jean-François Fabre