How to slice a generator object or iterator?

Tags:

I would like to loop over a "slice" of an iterator. I'm not sure if this is possible as I understand that it is not possible to slice an iterator. What I would like to do is this:

def f():
    for i in range(100):
        yield(i)
x = f()

for i in x[95:]:
    print(i)

This of course fails with:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-37-15f166d16ed2> in <module>()
  4 x = f()
  5 
----> 6 for i in x[95:]:
  7     print(i)

TypeError: 'generator' object is not subscriptable

Is there a pythonic way to loop through a "slice" of a generator?

Basically the generator I'm actually concerned with reads a very large file and performs some operations on it line by line. I would like to test slices of the file to make sure that things are performing as expected, but it is very time consuming to let it run over the entire file.

Edit:
As mentioned I need to to this on a file. I was hoping that there was a way of specifying this explicitly with the generator for instance:

import skbio

f = 'seqs.fna'
seqs = skbio.io.read(f, format='fasta')

seqs is a generator object

for seq in itertools.islice(seqs, 30516420, 30516432):
    #do a bunch of stuff here
    pass

The above code does what I need, however is still very slow as the generator still loops through the all of the lines. I was hoping to only loop over the specified slice

443

asked Jan 11 '16 22:01

johnchase

3 Answers

In general, the answer is itertools.islice, but you should note that islice doesn't, and can't, actually skip values. It just grabs and throws away start values before it starts yield-ing values. So it's usually best to avoid islice if possible when you need to skip a lot of values and/or the values being skipped are expensive to acquire/compute. If you can find a way to not generate the values in the first place, do so. In your (obviously contrived) example, you'd just adjust the start index for the range object.

In the specific cases of trying to run on a file object, pulling a huge number of lines (particularly reading from a slow medium) may not be ideal. Assuming you don't need specific lines, one trick you can use to avoid actually reading huge blocks of the file, while still testing some distance in to the file, is the seek to a guessed offset, read out to the end of the line (to discard the partial line you probably seeked to the middle of), then islice off however many lines you want from that point. For example:

import itertools

with open('myhugefile') as f:
    # Assuming roughly 80 characters per line, this seeks to somewhere roughly
    # around the 100,000th line without reading in the data preceding it
    f.seek(80 * 100000)
    next(f)  # Throw away the partial line you probably landed in the middle of
    for line in itertools.islice(f, 100):  # Process 100 lines
        # Do stuff with each line

For the specific case of files, you might also want to look at mmap which can be used in similar ways (and is unusually useful if you're processing blocks of data rather than lines of text, possibly randomly jumping around as you go).

Update: From your updated question, you'll need to look at your API docs and/or data format to figure out exactly how to skip around properly. It looks like skbio offers some features for skipping using seq_num, but that's still going to read if not process most of the file. If the data was written out with equal sequence lengths, I'd look at the docs on Alignment; aligned data may be loadable without processing the preceding data at all, by e.g by using Alignment.subalignment to create new Alignments that skip the rest of the data for you.

answered Oct 14 '22 05:10

ShadowRanger

You can't slice a generator object or iterator using a normal slice operations. Instead you need to use itertools.islice as @jonrsharpe already mentioned in his comment.

import itertools    

for i in itertools.islice(x, 95)
    print(i)

Also note that islice returns an iterator and consume data on the iterator or generator. So you will need to convert you data to list or create a new generator object if you need to go back and do something or use the little known itertools.tee to create a copy of your generator.

from itertools import tee


first, second = tee(f())

answered Oct 14 '22 06:10

styvane

islice is the pythonic way

from itertools import islice    

g = (i for i in range(100))

for num in islice(g, 95, None):
    print num

answered Oct 14 '22 06:10

Yoav Glazner

Related questions
                            
                                PySNMP can not recognize response
                            
                                Drawing window border in Python xlib
                            
                                Static code analysis in Python?
                            
                                OpenCV in the cloud
                            
                                Is it idiomatic Python to use an abstract class for event handler callbacks?
                            
                                using google protobuffers reflection in python
                            
                                Software Design and Development Major: Pygame Smudge Trails
                            
                                Cx_Freeze - How to Include Modules
                            
                                How to access Google Cloud Platform Firestore triggers from Python runtime cloud functions
                            
                                Complexity of converting a set to a frozenset in Python
                            
                                TypeError: minimize() missing 1 required positional argument: 'var_list'
                            
                                find groups of neighboring True in pandas series
                            
                                How to animate a 2D scatter plot given X, Y coordinates and time with appearing and disappearing points?
                            
                                Implementing sparse connections in neural network (Theano)
                            
                                What is the Python docstring format supported by Visual Studio Code?
                            
                                Checking module name inside 'except ImportError'
                            
                                Sort graph nodes according to their degree
                            
                                Is string interning really useful?
                            
                                Efficient way to add a singleton dimension to a NumPy vector so that slice assignments work
                            
                                How to use Google API credentials json on Heroku?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to slice a generator object or iterator?

Tags:

python

generator

slice

for-loop