I have this list of lists :
cont_det = [['TASU 117000 0', "TGHU 759933 - 0", 'CSQU3054383', 'BMOU 126 780-0', "HALU 2014 13 3"], ['40HS'], ['Ha2ardous Materials', 'Arm5 Maehinery']]
Practically cont_det
is a huge list with lots of sub-lists with irregular length of each sub-list. This is just a sample case for demonstration. I want to get the following output :
[['TASU 117000 0', '40HS', 'Ha2ardous Materials'],
['TGHU 759933 - 0', '40HS', 'Arm5 Maehinery'],
['CSQU3054383', '40HS', 'Ha2ardous Materials'],
['BMOU 126 780-0', '40HS', 'Ha2ardous Materials'],
['HALU 2014 13 3', '40HS', 'Ha2ardous Materials']]
The logic behind this is zip_longest
the list of lists but in case there is any sub-list whose length is less than the maximum of all lengths of the sub-lists (which is 5 here for first sub-list), then in stead of default fillvalue=None
take the first item of that sub-list - as seen in case of second sub-list all reflected filled values are same and for the third one, the last three are filled by the first value.
I have got the result with this code :
from itertools import zip_longest as zilo
from more_itertools import padded as pad
max_ = len(max(cont_det, key=len))
for i, cont_row in enumerate(cont_det):
if len(cont_det)!=max_:
cont_det[i] = list(pad(cont_row, cont_row[0], max_))
cont_det = list(map(list, list(zilo(*cont_det))))
This gives me the expected result. In stead had I done list(zilo(*cont_det, fillvalue=''))
I would have gotten this :
[('TASU 117000 0', '40HS', 'Ha2ardous Materials'),
('TGHU 759933 - 0', '', 'Arm5 Maehinery'),
('CSQU3054383', '', ''),
('BMOU 126 780-0', '', ''),
('HALU 2014 13 3', '', '')]
Is there any other process (like mapping any function or so) to the parameter fillvalue
of the zip_longest
function so that I don't have to iterate through the list to pad each sub-list up to the length of the longest sub-list before that and this thing can be done in a line with only zip_longest
?
You can peek into each of the iterators via next
in order to extract the first item ("head"), then create a sentinel
object that marks the end of the iterator and finally chain
everything back together in the following way: head -> remainder_of_iterator -> sentinel -> it.repeat(head)
.
This uses it.repeat
to replay the first item ad infinitum once the end of the iterator has been reached, so we need to introduce a way to stop that process once the last iterator hits its sentinel
object. For this we can (ab)use the fact that map
stops iterating if the mapped function raises (or leaks) a StopIteration
, such as from next
invoked on an already exhausted iterator. Alternatively we can use the 2-argument form of iter
to stop on a sentinel
object (see below).
So we can map the chained iterators over a function that checks for each item whether it is sentinel
and performs the following steps:
if item is sentinel
then consume a dedicated iterator that yields one item fewer than the total number of iterators via next
(hence leaking StopIteration
for the last sentinel) and replace the sentinel
with the corresponding head
.else
just return the original item.Finally we can just zip
the iterators together - it will stop on the last one hitting its sentinel
object, i.e. performing a "zip-longest".
In summary, the following function performs the steps described above:
import itertools as it
def solution(*iterables):
iterators = [iter(i) for i in iterables] # make sure we're operating on iterators
heads = [next(i) for i in iterators] # requires each of the iterables to be non-empty
sentinel = object()
iterators = [it.chain((head,), iterator, (sentinel,), it.repeat(head))
for iterator, head in zip(iterators, heads)]
# Create a dedicated iterator object that will be consumed each time a 'sentinel' object is found.
# For the sentinel corresponding to the last iterator in 'iterators' this will leak a StopIteration.
running = it.repeat(None, len(iterators) - 1)
iterators = [map(lambda x, h: next(running) or h if x is sentinel else x, # StopIteration causes the map to stop iterating
iterator, it.repeat(head))
for iterator, head in zip(iterators, heads)]
return zip(*iterators)
If leaking StopIteration
from the mapped function in order to terminate the map
iterator feels too awkward then we can slightly modify the definition of running
to yield an additional sentinel
and use the 2-argument form of iter
in order to stop on sentinel
:
running = it.chain(it.repeat(None, len(iterators) - 1), (sentinel,))
iterators = [...] # here the conversion to map objects remains unchanged
return zip(*[iter(i.__next__, sentinel) for i in iterators])
If the name resolution for sentinel
and running
from inside the mapped function is a concern, they can be included as arguments to that function:
iterators = [map(lambda x, h, s, r: next(r) or h if x is s else x,
iterator, it.repeat(head), it.repeat(sentinel), it.repeat(running))
for iterator, head in zip(iterators, heads)]
That looks like some sort of "matrix rotation".
I've done it without any libs used to make it clear for everybody. That's pretty easy as for me.
from pprint import pprint
cont_det = [
['TASU 117000 0', "TGHU 759933 - 0", 'CSQU3054383', 'BMOU 126 780-0', "HALU 2014 13 3"],
['40HS'],
['Ha2ardous Materials', 'Arm5 Maehinery'],
]
def rotate_matrix(source):
result = []
# let's find the longest sub-list length
length = max((len(row) for row in source))
# for every column in sub-lists create a new row in the resulting list
for column_id in range(0, length):
result.append([])
# let's fill the new created row using source row columns data.
for row_id in range(0, len(source)):
# let's use the first value from the sublist values if source row list has it for the column_id
if len(source[row_id]) > column_id:
result[column_id].append(source[row_id][column_id])
else:
try:
result[column_id].append(source[row_id][0])
except IndexError:
result[column_id].append(None)
return result
pprint(rotate_matrix(cont_det))
And, of course, the script output
> python test123.py
[['TASU 117000 0', '40HS', 'Ha2ardous Materials'],
['TGHU 759933 - 0', '40HS', 'Arm5 Maehinery'],
['CSQU3054383', '40HS', 'Ha2ardous Materials'],
['BMOU 126 780-0', '40HS', 'Ha2ardous Materials'],
['HALU 2014 13 3', '40HS', 'Ha2ardous Materials']]
Can't understand about
zip_longest
function. Is it a requirement for the solution or you need a solution "which just works" :) Because it doesn't look likezip_longest
supports any sort of callbacks or etc where we can return required value "per cell" in the matrix.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With