How to tell BeautifulSoup to extract the content of a specific tag as text? (without touching it)

Question

I need to parse an html document which contains "code" tags

I'm getting the code blocks like this:

soup = BeautifulSoup(str(content))
code_blocks = soup.findAll('code')

The problem is, if i have a code tag like this:

<code class="csharp">
    List<Person> persons = new List<Person>();
</code>

BeautifulSoup forse the closing of nested tags and transform the code block into:

<code class="csharp">
    List<person> persons = new List</person><person>();
    </person>
</code>

is there any way to extract the content of the code tags as text with BeautifulSoup without letting it fix what IT thinks are html markup errors?

Rod · Accepted Answer

Add the code tag to the QUOTE_TAGS dictionary.

from BeautifulSoup import BeautifulSoup

content = "<code class='csharp'>List<Person> persons = new List<Person>();</code>"

BeautifulSoup.QUOTE_TAGS['code'] = None
soup = BeautifulSoup(str(content))
code_blocks = soup.findAll('code')

Output:

[<code class="csharp"> List<Person> persons = new List<Person>(); </code>]

How to tell BeautifulSoup to extract the content of a specific tag as text? (without touching it)

Tags:

python

beautifulsoup

syntax-highlighting

BFil

1 Answers

Rod

Recent Activity

Donate For Us

How to tell BeautifulSoup to extract the content of a specific tag as text? (without touching it)

Tags:

python

beautifulsoup

syntax-highlighting

BFil

1 Answers

Rod

Related questions

Recent Activity

Donate For Us