I have a generator for a large set of items. I want to iterate through them once, outputting them to a file. However, with the file format I currently have, I first have to output the number of items I have. I don't want to build a list of the items in memory, as there are too many of them and that would take a lot of time and memory. Is there a way to iterate through the generator, getting its length, but somehow be able to iterate through it again later, getting the same items?
If not, what other solution could I come up with for this problem?
To get the length of a generator in Python: Use the list() class to convert the generator to a list. Pass the list to the len() function, e.g. len(list(gen)) .
Generators are memory-friendly as they return and store the portion of data only when it is demanded. We can define generators with generators expressions or generator functions.
List comprehensions are usually faster than generator expressions as generator expressions create another layer of overhead to store references for the iterator. However, the performance difference is often quite small.
If you can figure out how to just write a formula to calculate the size based on the parameters that control the generator, do that. Otherwise, I don't think you would save much time.
Include the generator here, and we'll try to do it for you!
This cannot be done. Once a generator is exhausted it needs to be reconstructed in order to be used again. It is possible to define the __len__()
method on an iterator object if the number of items is known ahead of time, and then len()
can be called against the iterator object.
I don't think that is possible for any generalized iterator. You will need to figure out how the generator was originally constructed and then regenerate it for the final pass.
Alternatively, you could write out a dummy size to your file, write the items, and then reopen the file for modification and correct the size in the header.
If your file is a binary format, this could work quite well, since the number of bytes for the size is the same regardless of what the actual size is. If it is a text format, it is possible that you would have to add some extra length to the file if you weren't able to pad the dummy size to cover all cases. See this question for a discussion on inserting and rewriting in a text file using Python.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With