Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Itemgetter Except Columns

Tags:

python

Is there a way to get the complement of a set of columns using itemgetter?

For example, you can get the first, third, and fifth elements of a list using

from operator import itemgetter
f = itemgetter(0, 2, 4)
f(['a', 'b', 'c', 'd', 'e']) ## == ('a', 'c', 'e')

Is there a (simple and performant) way to get all of the elements except for the first, third and fifth?

like image 630
Michael K Avatar asked Nov 07 '18 17:11

Michael K


People also ask

How do you sort a list using Itemgetter in Python?

To sort by more than one column you can use itemgetter with multiple indices: operator. itemgetter(1,2) , or with lambda: lambda elem: (elem[1], elem[2]) . This way, iterables are constructed on the fly for each item in list, which are than compared against each other in lexicographic(?)

What is Python Itemgetter?

itemgetter(item) operator. itemgetter(*items) Return a callable object that fetches item from its operand using the operand's __getitem__() method. If multiple items are specified, returns a tuple of lookup values.

How do you sort by two parameters in Python?

The sort() method has two optional parameters: the key parameter and reverse parameter. The key parameter takes in a function that takes a single argument and returns a key to use for sorting. By default, the sort() method will sort a list of numbers by their values and a list of strings alphabetically.

What method from the operator module should be used to specify an index as a key for the sorted method?

What method from the operator modu to specify an index as a key for the sorted () method? indexOf() setitem() itemgetter () attrgetter ()


3 Answers

No, there is no way to spell everything but these indices in Python.

You'd have to lock down the length of all inputs and hardcode the included indices, so itemgetter(*(i for i in range(fixed_list_length) if i not in {0, 2, 4})), but then you'd be locked down to processing only objects of a specific length.

If your inputs are of variable length, then one distant option is to use slices to get everything after the 4th element:

itemgetter(1, 3, slice(5, None))

but then you'd get a separate list for the slice component:

>>> itemgetter(1, 3, slice(5, None))(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
('b', 'd', ['f', 'g'])

and an error if the input sequence is not at least 4 elements long:

>>> itemgetter(1, 3, slice(5, None))(['a', 'b', 'c'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

Rather than use itemgetter(), just use a set and a lambda that uses a list comprehension:

def excludedgetter(*indices):
    excluded = set(indices)
    return lambda seq: [v for i, v in enumerate(seq) if i not in excluded]

That callable can be used for inputs of any length:

>>> from random import randrange
>>> pile = [
...     [randrange(10) for _ in range(randrange(8))]
...     for _ in range(10)
... ]
>>> min(len(l) for l in pile), max(len(l) for l in pile)
(0, 6)
>>> sorted(pile, key=excludedgetter(0, 2, 4))
[[], [1], [9, 1, 8, 2, 4, 0], [0, 3], [7, 3, 4, 9, 7, 7], [8, 4, 4], [6, 4, 7, 9, 9], [0, 5, 3, 7, 2], [4, 6, 6, 0], [8, 8, 1]]

Those random-length lists are no problem.

like image 177
Martijn Pieters Avatar answered Oct 19 '22 16:10

Martijn Pieters


Since you're asking about itemgetter() specifically: you could use a set to get the difference:

>>> from operator import itemgetter

>>> obj = ['a', 'b', 'c', 'd', 'e']
>>> c = {1, 3, 5}  # Get everything but these
>>> get = set(range(len(obj))).difference(c)
>>> f = itemgetter(*get)
>>> f(obj)
('a', 'c', 'e')

where set(range(len(obj))) is all the indices, i.e. {0, 1, 2, 3, 4}.


Disclaimer: this will not guarantee sortedness given that sets are unordered. While it is a bit less efficient, you could be safer with:

f = itemgetter(*sorted(get))

Granted, this requires you to know the length of the list in advance, prior to the call to itemgetter(), and requires a call to that function for indexing each list.

like image 40
Brad Solomon Avatar answered Oct 19 '22 18:10

Brad Solomon


You're looking for a quasi-vectorised operation. This isn't possible with regular Python, or even with 3rd party NumPy where the result is an array. But the latter does offer syntactic benefits:

import numpy as np

A = ['a', 'b', 'c', 'd', 'e']

exc = [0, 2, 4]

res1 = [val for idx, val in enumerate(A) if idx not in exc]
res2 = np.delete(A, exc).tolist()

assert res1 == res2

If you use the list comprehension, you should covert exc to set first to enable O(1) lookup.

like image 29
jpp Avatar answered Oct 19 '22 16:10

jpp