Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to change tag name with BeautifulSoup?

Tags:

I am using python + BeautifulSoup to parse an HTML document.

Now I need to replace all <h2 class="someclass"> elements in an HTML document, with <h1 class="someclass">.

How can I change the tag name, without changing anything else in the document?

like image 399
daphshez Avatar asked Mar 13 '11 11:03

daphshez


People also ask

How do you replace a tag in BeautifulSoup?

Notes. To replace a tag in Beautful Soup, find the element then call its replace_with method passing in either a string or tag.

Is tag editable in BeautifulSoup?

The navigablestring object is used to represent the contents of a tag. To access the contents, use “. string” with tag. You can replace the string with another string but you can't edit the existing string.

How do you replace a tag in Python?

You can just use the replace method on the string. >>> s = 'This is an [[example]] sentence. It is [[awesome]].


2 Answers

I don't know how you're accessing tag but the following works for me:

import BeautifulSoup

if __name__ == "__main__":
    data = """
<html>
<h2 class='someclass'>some title</h2>
<ul>
   <li>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</li>
   <li>Aliquam tincidunt mauris eu risus.</li>
   <li>Vestibulum auctor dapibus neque.</li>
</ul>
</html>

    """
    soup = BeautifulSoup.BeautifulSoup(data)
    h2 = soup.find('h2')
    h2.name = 'h1'
    print soup

Output of print soup command is:

<html>
<h1 class='someclass'>some title</h1>
<ul>
<li>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</li>
<li>Aliquam tincidunt mauris eu risus.</li>
<li>Vestibulum auctor dapibus neque.</li>
</ul>
</html>

As you can see, h2 became h1. And nothing else in the document changed. I am using Python 2.6 and BeautifulSoup 3.2.0.

If you have more than one h2 and you want to change them all, you could simple do:

soup = BeautifulSoup.BeautifulSoup(your_data)
while True: 
    h2 = soup.find('h2')
    if not h2:
        break
    h2.name = 'h1'
like image 161
Manuel Salvadores Avatar answered Sep 28 '22 12:09

Manuel Salvadores


It's just:

tag.name = 'new_name'
like image 41
Andrey Shipilov Avatar answered Sep 28 '22 13:09

Andrey Shipilov