Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I get at the contents of an iterator?

I'm thoroughly puzzled. I have a block of HTML that I scraped out of a larger table. It looks about like this:

<td align="left" class="page">Number:\xc2\xa0<a class="topmenu" href="http://www.example.com/whatever.asp?search=724461">724461</a> Date:\xc2\xa01/1/1999 Amount:\xc2\xa0$2.50 <br/>Person:<br/><a class="topmenu" href="http://www.example.com/whatever.asp?search=LAST&amp;searchfn=FIRST">LAST,\xc2\xa0FIRST </a> </td>

(Actually, it looked worse, but I regexed out a lot of line breaks)

I need to get the lines out, and break up the Date/Amount line. It seemed like the place to start was to find the children of that block of HTML. The block is a string because that's how regex gave it back to me. So I did:

text_soup = BeautifulSoup(text)
text_children = text_soup.find('td').childGenerator()

I've worked out that I can only iterate through text_children once, though I don't understand why that is. It's a listiterator type, which I'm struggling to understand.

I'm used to being able to assume that if I can iterate through something with a for loop I can call on any one element with something like text_children[0]. That doesn't seem to be the case with an iterator. If I create a list with:

my_array = ["one","two","three"] 

I can use my_array[1] to see the second item in the array. If I try to do text_children[1] I get an error:

TypeError: 'listiterator' object is not subscriptable

How do I get at the contents of an iterator?

like image 910
Amanda Avatar asked Nov 21 '12 14:11

Amanda


People also ask

How do you access value from iterator?

Before you can access a collection through an iterator, you must obtain one. Each of the collection classes provides an iterator( ) method that returns an iterator to the start of the collection. By using this iterator object, you can access each element in the collection, one element at a time.

How do you access an iterator in Python?

Iterator in Python is an object that is used to iterate over iterable objects like lists, tuples, dicts, and sets. The iterator object is initialized using the iter() method. It uses the next() method for iteration. __next__(): The next method returns the next value for the iterable.

What does an iterator contain?

An iterator is an object (like a pointer) that points to an element inside the container. We can use iterators to move through the contents of the container. They can be visualized as something similar to a pointer pointing to some location and we can access the content at that particular location using them.

Which function is used to retrieve an iterator object?

An object is called iterable if we can get an iterator from it. Most built-in containers in Python like: list, tuple, string etc. are iterables. The iter() function (which in turn calls the __iter__() method) returns an iterator from them.


1 Answers

You can easy construct a list from the iterator by:

my_list = list(your_generator)

Now you can subscript the elements:

print(my_list[1])

another way to get the value is by using next. This will pull the next value from the iterator, but as you've already discovered, once you pull a value out of the iterator, you can't always put it back in (whether or not you can put it back in depends entirely on the object that is being iterated over and what its next method actually looks like).

The reason for this is that often you just want an object that you can iterate over. iterators are great for that as they calculate the elements 1 at a time rather than needing to store all of the values. In other words, you only have one element from the iterator consuming your system's memory at a time -- vs. a list or a tuple where all of the elements are typically stored in memory before you start iterating.

like image 62
mgilson Avatar answered Sep 27 '22 22:09

mgilson