Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing tags of one kind with tags of another in BeautifulSoup

I have a collection of HTML files. I wish to iterate over them, one by one, editing the mark-up of a particular class. The code I wish to edit is of the following form, using the following class names :

<td class='thisIsMyClass' colspan=4>
  <a id='123' class='thisIsMyOtherClass' href='123'>Put me Elsewhere</a> 

This can occur multiple times in the same document, with different text instead of "Put me Elsewhere", but always the same classes.

I want to change this to be of the form :

<font SIZE="3"  COLOR="#333333"  FACE="Verdana"  STYLE="background-color:#ffffff;font-weight: bold;">
  <h2>Put Me Elsewhere</h2>
</font>
import os
for filename in os.listdir('dirname'):
 replace(filename)

def replace(filename):
 tags = soup.find_all(attrs={"thisIsMyClass"})

Not too sure where to go after this or how to deal with the tags array? Any help would be much appreciated. Thanks :)

like image 666
Simon Kiely Avatar asked Dec 01 '14 16:12

Simon Kiely


2 Answers

Much better and more beautiful would be to prepare a replacement HTML string with a placeholder, find all td tags with thisIsMyClass class and use .replace_with() to replace each:

from bs4 import BeautifulSoup

data = """
<table>
    <tr>
        <td class='thisIsMyClass' colspan=4>
          <a id='123' class='thisIsMyOtherClass' href='123'>Put me Elsewhere</a>
        </td>
    </tr>
</table>
"""

replacement = """
<font SIZE="3"  COLOR="#333333"  FACE="Verdana"  STYLE="background-color:#ffffff;font-weight: bold;">
  <h2>{text}</h2>
</font>
"""

soup = BeautifulSoup(data, 'html.parser')
for td in soup.select('td.thisIsMyClass'):
    td.replace_with(BeautifulSoup(replacement.format(text=td.a.text), 'html.parser'))

print soup.prettify()

Prints:

<table>
    <tr>
        <font color="#333333" face="Verdana" size="3" style="background-color:#ffffff;font-weight: bold;">
            <h2>
            Put me Elsewhere
            </h2>
        </font>
    </tr>
</table>
like image 136
alecxe Avatar answered Sep 30 '22 13:09

alecxe


It's as simple as assigning to the name attribute.

# for quick testing:
# tag = BeautifulSoup("<td class='thisIsMyClass' colspan=4><a id='123' class='thisIsMyOtherClass' href='123'>Put me Elsewhere</a>")
# tags = [tag]
for tag in tags:
    tag.td.name = "font"
    tag.font["SIZE"] = 3
    del tag.font["class"]
    ...
    tag.a.name = "h2"
    ...
    print(tag)
    # <font SIZE="3" colspan="4"><h2 class="thisIsMyOtherClass" href="123" id="123">Put me Elsewhere</h2></font>

Also the documentation is your friend. It's quite comprehensive.

like image 39
ento Avatar answered Sep 30 '22 15:09

ento