I have a python list where elements can repeat.
>>> a = [1,2,2,3,3,4,5,6]
I want to get the first n
unique elements from the list. So, in this case, if i want the first 5 unique elements, they would be:
[1,2,3,4,5]
I have come up with a solution using generators:
def iterate(itr, upper=5): count = 0 for index, element in enumerate(itr): if index==0: count += 1 yield element elif element not in itr[:index] and count<upper: count += 1 yield element
In use:
>>> i = iterate(a, 5) >>> [e for e in i] [1,2,3,4,5]
I have doubts on this being the most optimal solution. Is there an alternative strategy that i can implement to write it in a more pythonic and efficient way?
To access the first n elements from a list, we can use the slicing syntax [ ] by passing a 0:n as an arguments to it . 0 is the start index (it is inculded).
Method #1 : Using sorted() + lambda The combination of above functionality can be used to perform this particular task. In this, we just employ sorted function with reverse flag true, and print the top N elements using list slicing.
python1min read To access the first element (12) of a list, we can use the subscript syntax [ ] by passing an index 0 . In Python lists are zero-indexed, so the first element is available at index 0 . Similarly, we can also use the slicing syntax [:1] to get the first element of a list in Python.
I would use a set
to remember what was seen and return from the generator when you have seen
enough:
a = [1, 2, 2, 3, 3, 4, 5, 6] def get_unique_N(iterable, N): """Yields (in order) the first N unique elements of iterable. Might yield less if data too short.""" seen = set() for e in iterable: if e in seen: continue seen.add(e) yield e if len(seen) == N: return k = get_unique_N([1, 2, 2, 3, 3, 4, 5, 6], 4) print(list(k))
Output:
[1, 2, 3, 4]
According to PEP-479 you should return
from generators, not raise StopIteration
- thanks to @khelwood & @iBug for that piece of comment - one never learns out.
With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration
Your solution using elif element not in itr[:index] and count<upper:
uses O(k)
lookups - with k
being the length of the slice - using a set reduces this to O(1)
lookups but uses more memory because the set has to be kept as well. It is a speed vs. memory tradeoff - what is better is application/data dependend.
Consider [1, 2, 3, 4, 4, 4, 4, 5]
vs [1] * 1000 + [2] * 1000 + [3] * 1000 + [4] * 1000 + [5] * 1000 + [6]
:
For 6 uniques (in longer list):
O(1)+O(2)+...+O(5001)
5001*O(1)
lookup + memory for set( {1, 2, 3, 4, 5, 6})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With