I have a list say <code>l = [10,10,20,15,10,20]</code>. I want to assign each unique value a certain "index" to get <code>[1,1,2,3,1,2]</code>. This is my code: <pre class="prettyprint"><code>a = list(set(l)) res = [a.index(x) for x in l] </code></pre> Which turns out to be very slow. <code>l</code> has 1M elements, and 100K unique elements. I have also tried map with lambda and sorting, which did not help. What is the ideal way to do this?

You can do this in <code>O(N)</code> time using a <code>defaultdict</code> and a list comprehension: <pre class="prettyprint"><code>>>> from itertools import count >>> from collections import defaultdict >>> lst = [10, 10, 20, 15, 10, 20] >>> d = defaultdict(count(1).next) >>> [d[k] for k in lst] [1, 1, 2, 3, 1, 2] </code></pre> In Python 3 use <code>__next__</code> instead of <code>next</code>. <hr> If you're wondering how it works? The <code>default_factory</code>(i.e <code>count(1).next</code> in this case) passed to <code>defaultdict</code> is called only when Python encounters a missing key, so for 10 the value is going to be 1, then for the next ten it is not a missing key anymore hence the previously calculated 1 is used, now 20 is again a missing key and Python will call the <code>default_factory</code> again to get its value and so on. <code>d</code> at the end will look like this: <pre class="prettyprint"><code>>>> d defaultdict(<method-wrapper 'next' of itertools.count object at 0x1057c83b0>, {10: 1, 20: 2, 15: 3}) </code></pre>

Indexing a list with an unique index

Tags:

python

list

indexing

I have a list say l = [10,10,20,15,10,20]. I want to assign each unique value a certain "index" to get [1,1,2,3,1,2].

This is my code:

a = list(set(l)) res = [a.index(x) for x in l]

Which turns out to be very slow.

l has 1M elements, and 100K unique elements. I have also tried map with lambda and sorting, which did not help. What is the ideal way to do this?

394

asked Dec 16 '15 13:12

Yfiua

2 Answers

You can do this in O(N) time using a defaultdict and a list comprehension:

>>> from itertools import count >>> from collections import defaultdict >>> lst = [10, 10, 20, 15, 10, 20] >>> d = defaultdict(count(1).next) >>> [d[k] for k in lst] [1, 1, 2, 3, 1, 2]

In Python 3 use __next__ instead of next.

If you're wondering how it works?

The default_factory(i.e count(1).next in this case) passed to defaultdict is called only when Python encounters a missing key, so for 10 the value is going to be 1, then for the next ten it is not a missing key anymore hence the previously calculated 1 is used, now 20 is again a missing key and Python will call the default_factory again to get its value and so on.

d at the end will look like this:

>>> d defaultdict(<method-wrapper 'next' of itertools.count object at 0x1057c83b0>,             {10: 1, 20: 2, 15: 3})

answered Sep 22 '22 10:09

Ashwini Chaudhary

The slowness of your code arises because a.index(x) performs a linear search and you perform that linear search for each of the elements in l. So for each of the 1M items you perform (up to) 100K comparisons.

The fastest way to transform one value to another is looking it up in a map. You'll need to create the map and fill in the relationship between the original values and the values you want. Then retrieve the value from the map when you encounter another of the same value in your list.

Here is an example that makes a single pass through l. There may be room for further optimization to eliminate the need to repeatedly reallocate res when appending to it.

res = [] conversion = {} i = 0 for x in l:     if x not in conversion:         value = conversion[x] = i         i += 1     else:         value = conversion[x]     res.append(value)

answered Sep 23 '22 10:09

dsh

Related questions
                            
                                Pycharm gets error "can't find '__main__' module"
                            
                                How to synchronize a python dict with multiprocessing
                            
                                argparse module not working in Python
                            
                                How to convert the output of meshgrid to the corresponding array of points?
                            
                                How to show query parameter options in Django REST Framework - Swagger
                            
                                Python merging two lists with all possible permutations
                            
                                Using SQLAlchemy session from Flask raises "SQLite objects created in a thread can only be used in that same thread"
                            
                                How to format seaborn/matplotlib axis tick labels from number to thousands or Millions? (125,436 to 125.4K)
                            
                                Why can I not catch a Queue.Empty exception from a multiprocessing Queue?
                            
                                Getting exception details in Python
                            
                                Python check if list items are integers? [duplicate]
                            
                                Adding y=x to a matplotlib scatter plot if I haven't kept track of all the data points that went in
                            
                                Round down datetime to previous hour
                            
                                Count number of words per row
                            
                                VS Code Python + Black formatter arguments - python.formatting.blackArgs
                            
                                Creating nested dataclass objects in Python
                            
                                Save Numpy Array using Pickle
                            
                                SymPy - Arbitrary number of Symbols
                            
                                Understanding "Too many ancestors" from pylint
                            
                                takeOrdered descending Pyspark

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With