Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting first n unique elements from Python list

I have a python list where elements can repeat.

>>> a = [1,2,2,3,3,4,5,6] 

I want to get the first n unique elements from the list. So, in this case, if i want the first 5 unique elements, they would be:

[1,2,3,4,5] 

I have come up with a solution using generators:

def iterate(itr, upper=5):      count = 0     for index, element in enumerate(itr):         if index==0:             count += 1             yield element          elif element not in itr[:index] and count<upper:             count += 1             yield element 

In use:

>>> i = iterate(a, 5) >>> [e for e in i] [1,2,3,4,5] 

I have doubts on this being the most optimal solution. Is there an alternative strategy that i can implement to write it in a more pythonic and efficient way?

like image 774
xssChauhan Avatar asked Dec 21 '18 16:12

xssChauhan


People also ask

How do you print the first 5 elements of a list in Python?

To access the first n elements from a list, we can use the slicing syntax [ ] by passing a 0:n as an arguments to it . 0 is the start index (it is inculded).

How do you print the top 5 elements of a list in Python?

Method #1 : Using sorted() + lambda The combination of above functionality can be used to perform this particular task. In this, we just employ sorted function with reverse flag true, and print the top N elements using list slicing.

How do you print the first part of a list Python?

python1min read To access the first element (12) of a list, we can use the subscript syntax [ ] by passing an index 0 . In Python lists are zero-indexed, so the first element is available at index 0 . Similarly, we can also use the slicing syntax [:1] to get the first element of a list in Python.


1 Answers

I would use a set to remember what was seen and return from the generator when you have seen enough:

a = [1, 2, 2, 3, 3, 4, 5, 6]      def get_unique_N(iterable, N):     """Yields (in order) the first N unique elements of iterable.      Might yield less if data too short."""     seen = set()     for e in iterable:         if e in seen:             continue         seen.add(e)         yield e         if len(seen) == N:             return              k = get_unique_N([1, 2, 2, 3, 3, 4, 5, 6], 4) print(list(k))      

Output:

[1, 2, 3, 4] 

According to PEP-479 you should return from generators, not raise StopIteration - thanks to @khelwood & @iBug for that piece of comment - one never learns out.

With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration


Your solution using elif element not in itr[:index] and count<upper: uses O(k) lookups - with k being the length of the slice - using a set reduces this to O(1) lookups but uses more memory because the set has to be kept as well. It is a speed vs. memory tradeoff - what is better is application/data dependend.

Consider [1, 2, 3, 4, 4, 4, 4, 5] vs [1] * 1000 + [2] * 1000 + [3] * 1000 + [4] * 1000 + [5] * 1000 + [6]:

For 6 uniques (in longer list):

  • you would have lookups of O(1)+O(2)+...+O(5001)
  • mine would have 5001*O(1) lookup + memory for set( {1, 2, 3, 4, 5, 6})
like image 63
Patrick Artner Avatar answered Sep 24 '22 07:09

Patrick Artner