BeautifulSoup: How to remove empty tables, while preserving tables that are partially empty or not empty

Question

I have an old website originally created in MS Frontpage that I'm trying to defrontpagify. I've written a BeautifulSoup script that does most of it. Only thing left is to remove empty tables, eg tables with no text content or data in any of their td tags.

The problem I'm stuck on is that what I've tried so far removes the table if at least one its td tags contains no data, even if others do. That removes all the tables in the entire document, including ones with data I want to preserve.

tags = soup.findAll('table',text=None,recursive=True) 
[tag.extract() for tag in tags]

Any suggestions how to only remove tables in which none of the td tags contain any data? (I don't care if they contain img or empty anchor tags, as long as there's no text).

Avaris · Accepted Answer

Use the .text property. It retrieves all text content (recursive) within that element.

Example:

from BeautifulSoup import BeautifulSoup as BS

html = """
<table id="empty">
  <tr><td></td></tr>
</table>

<table id="with_text">
  <tr><td>hey!</td></tr>
</table>

<table id="with_text_in_one_row">
  <tr><td></td></tr>
  <tr><td>hey!</td></tr>
</table>

<table id="no_text_but_img">
  <tr><td><img></td></tr>
</table>

<table id="no_text_but_a">
  <tr><td><a></a></td></tr>
</table>

<table id="text_in_a">
  <tr><td><a>hey!</a></td></tr>
</table>

"""

soup = BS(html)
for table in soup.findAll("table" ,text=None,recursive=True):
    if table.text:
        print table["id"]

Outputs:

with_text
with_text_in_one_row
text_in_a

BeautifulSoup: How to remove empty tables, while preserving tables that are partially empty or not empty

Tags:

python

parsing

html-parsing

beautifulsoup

Kurtosis

1 Answers

Avaris

Recent Activity

Donate For Us

BeautifulSoup: How to remove empty tables, while preserving tables that are partially empty or not empty

Tags:

python

parsing

html-parsing

beautifulsoup

Kurtosis

1 Answers

Avaris

Related questions

Recent Activity

Donate For Us