Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting HTML textarea contents with python, BeautifulSoup, mechanize (no forms, just divs)

I'm trying to fill out a form containing a textarea element. I'm using Python with the BeautifulSoap and mechanize modules (stuck on 2.6.5 on FreeBSD 8.1 with the latest modules in the FreeBSD repository: BeautifulSoup 3.1.0.1 and mechanize 0.2.1).

The problem with BeautifulSoap is it doesn't properly set textarea contents (I can try soup.textarea.insert(0, "FOO") or even soup.textarea.contents = "FOO", but once I check the current value with soup.textarea, I still see the old HTML tags with no content between them:

<textarea name="classified_description" class="classified_textarea_text"></textarea>

The problem with mechanize is it only seems to operate on true forms. Per the HTML I'm parsing below, this is not really a form, but rather a set of divs with input items inside.

How can I use Python or either of these modules to set the value of this textarea element?

<div class="classified_field">
            <div class="classified_input_label">Description</div>
            <div class="classified_textarea_div">
                <textarea name="classified_description" id="classified_description" class="classified_textarea_text"></textarea>
            </div>
            <div class="site_clear"></div>
        </div>

I'd tried Vladimir's technique below, and while it works with his example, it does not work in my production code for some reason. I'm able to use .find() to get the textarea, but the .insert() is giving me grief. Here's what I have so far:

>>> soup.find('textarea', {'name': 'classified_description'})                  
<textarea name="classified_description" class="classified_textarea_text"></textarea>
>>> soup.find('textarea', {'name': 'classified_description'}).insert(0, "some text here")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.6/site-packages/BeautifulSoup.py", line 233, in insert
  newChild.nextSibling.previousSibling = newChild
AttributeError: 'unicode' object has no attribute 'previousSibling'
>>> 

Anyone know why this would through the unicode error? Clearly my soup object is not just a unicode string because I successfully use .find.

SOLUTION: Vladimir's solution is correct, but it's possible for real-world HTML to generate a malformed start tag error in BeautifulSoup 3.1 (official reason here). After downgrading to BeautifulSoup 3.0.8, everything worked fine. When I posted the initial question, I had to do some jury rigging to get mechanize to read() into the BeautifulSoup object so as not to geht the malformed start tag error. This caused a uencode sting to be created instead of a BeautifulSoup object. Correcting my mechanize code with the older BeautifulSoup has caused the desired behavior.

like image 353
hamx0r Avatar asked Nov 04 '22 18:11

hamx0r


1 Answers

Here is an example using BeautifulSoup:

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup('<textarea name="classified_description"></textarea>')
soup.find('textarea', {'name': 'classified_description'}).insert(0, 'value')
assert str(soup) == '<textarea name="classified_description">value</textarea>'

BeautifulSoup documentation on modifying the parse tree describes such transformations in details.

like image 153
Vladimir Avatar answered Nov 09 '22 17:11

Vladimir