How do I get at the contents of an iterator?

Tags:

beautifulsoup

I'm thoroughly puzzled. I have a block of HTML that I scraped out of a larger table. It looks about like this:

<td align="left" class="page">Number:\xc2\xa0<a class="topmenu" href="http://www.example.com/whatever.asp?search=724461">724461</a> Date:\xc2\xa01/1/1999 Amount:\xc2\xa0$2.50 <br/>Person:<br/><a class="topmenu" href="http://www.example.com/whatever.asp?search=LAST&amp;searchfn=FIRST">LAST,\xc2\xa0FIRST </a> </td>

(Actually, it looked worse, but I regexed out a lot of line breaks)

I need to get the lines out, and break up the Date/Amount line. It seemed like the place to start was to find the children of that block of HTML. The block is a string because that's how regex gave it back to me. So I did:

text_soup = BeautifulSoup(text)
text_children = text_soup.find('td').childGenerator()

I've worked out that I can only iterate through text_children once, though I don't understand why that is. It's a listiterator type, which I'm struggling to understand.

I'm used to being able to assume that if I can iterate through something with a for loop I can call on any one element with something like text_children[0]. That doesn't seem to be the case with an iterator. If I create a list with:

my_array = ["one","two","three"]

I can use my_array[1] to see the second item in the array. If I try to do text_children[1] I get an error:

TypeError: 'listiterator' object is not subscriptable

How do I get at the contents of an iterator?

910

asked Nov 21 '12 14:11

Amanda

1 Answers

You can easy construct a list from the iterator by:

my_list = list(your_generator)

Now you can subscript the elements:

print(my_list[1])

another way to get the value is by using next. This will pull the next value from the iterator, but as you've already discovered, once you pull a value out of the iterator, you can't always put it back in (whether or not you can put it back in depends entirely on the object that is being iterated over and what its next method actually looks like).

The reason for this is that often you just want an object that you can iterate over. iterators are great for that as they calculate the elements 1 at a time rather than needing to store all of the values. In other words, you only have one element from the iterator consuming your system's memory at a time -- vs. a list or a tuple where all of the elements are typically stored in memory before you start iterating.

answered Sep 27 '22 22:09

mgilson

Related questions
                            
                                Why can't I replace the __str__ method of a Python object with another function?
                            
                                How to convert a string to its Base-10 representation?
                            
                                Why can't I use string functions inside map()?
                            
                                File too Large python
                            
                                How '\a' equal to '\7' in python?
                            
                                What is the Google Appengine Ndb GQL query max limit?
                            
                                Can csv data be made lazy?
                            
                                Optimal format for simple data storage in python
                            
                                Pyramid: Routing schemas and restraints
                            
                                Django Build URLs from template with integer param, the primary key
                            
                                Creating new variables in loop, with names from list, in Python
                            
                                Convert rows into columns
                            
                                Django can't find template
                            
                                Pyramid subrequests
                            
                                Plotting a 2D array with matplotlib.imshow
                            
                                How to check dict.has_key(k,x) with 2 variables
                            
                                Python - returning from a Tkinter callback
                            
                                Python: Traceback codecs.charmap_decode(input,self.errors,decoding_table)[0]
                            
                                QtSingleApplication for PySide or PyQt
                            
                                Sort a list of tuples by value and then alphabetically

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I get at the contents of an iterator?

Tags:

python

beautifulsoup

Amanda

People also ask

1 Answers

mgilson

Recent Activity

Donate For Us