Assign unique id to list of lists in python where duplicates get the same id

Tags:

I have a list of lists (which can contain up to 90k elements)

[[1,2,3], [1,2,4], [1,2,3], [1,2,4], [1,2,5]]

I would like to assign an id to each elements, where the id is unique, except when the item is duplicated. So for the list above, I'd need this back:

[0,1,0,1,2]

What is the most effective way to do this?

234

asked Jul 10 '16 11:07

jamborta

1 Answers

Keep a map of already seen elements with the associated id.

from itertools import count
from collections import defaultdict


mapping = defaultdict(count().__next__)
result = []
for element in my_list:
    result.append(mapping[tuple(element)])

you could also use a list-comprehension:

result = [mapping[tuple(element)] for element in my_list]

Unfortunately lists aren't hashable so you have to convert them to a tuple when storing them as keys of the mapping.

Note the trick of using defaultdict, and count().__next__ to provide unique increasing ids. On python2 you have to replace .__next__ with .next.

The defaultdict will assign a default value when it cannot find a key. The default value is obtained by calling the function provided in the constructor. In this case the __next__ method of the count() generator yields increasing numbers.

As a more portable alternative you could do:

from functools import partial

mapping = defaultdict(partial(next, count()))

An alternative solution, as proposed in the comments, is to just use the index as unique id:

result = [my_list.index(el) for el in my_list]

This is imple however:

It takes O(N^2) time instead of O(N)
The ids are unique, increasing but not consecutive (which may or may not be a problem)

For comparison of the two solutions see:

In [1]: from itertools import count
   ...: from collections import defaultdict

In [2]: def hashing(seq):
   ...:         mapping = defaultdict(count().__next__)
   ...:         return [mapping[tuple(el)] for el in seq]
   ...: 

In [3]: def indexing(seq):
   ...:    return [seq.index(i) for i in seq]
   ...: 

In [4]: from random import randint

In [5]: seq = [[randint(1, 20), randint(1, 20), randint(1, 20)] for _ in range(90000)]

In [6]: %timeit hashing(seq)
10 loops, best of 3: 37.7 ms per loop

In [7]: %timeit indexing(seq)
1 loop, best of 3: 26 s per loop

Note how for a 90k element list the mapping solution takes less 40 milliseconds while the indexing solution takes 26 seconds.

answered Oct 10 '22 02:10

Bakuriu

Related questions
                            
                                One line tree implementation
                            
                                Labelling a matplotlib histogram bin with an arrow
                            
                                Execute a Jupyter notebook including inline markdown with nbconvert
                            
                                concat a DataFrame with a Series in Pandas
                            
                                Multiple signup, registration forms using django-allauth
                            
                                Wait for condition in given timeout
                            
                                how to use ajax function to send form without page getting refreshed, what am I missing?Do I have to use rest-framework for this?
                            
                                Python click pass unspecified number of kwargs [duplicate]
                            
                                pyqt5 - closing/terminating application
                            
                                pandas multi-index how to mask the data by the second level
                            
                                Plot contours of distribution on all three axes in 3D plot
                            
                                TensorFlow: Hadamard Product:: How do I get this?
                            
                                The difference between np.random.seed(int) and np.random.seed(array_like)?
                            
                                Aligning table to x-axis using matplotlib python
                            
                                Comparing pandas DataFrame to Series
                            
                                Run python program without 'python' in Windows [duplicate]
                            
                                Flask server cannot read file uploaded by POST request
                            
                                Django run code on application start but not on migrations
                            
                                scikit-learn: don't separate hyphenated words while tokenization
                            
                                How to set common prefix for all tables in SQLAlchemy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Assign unique id to list of lists in python where duplicates get the same id

Tags:

python

list

counting

jamborta

People also ask

1 Answers

Bakuriu

Recent Activity

Donate For Us