Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

< > changed to &lt; and &gt; while parsing html with beautifulsoup in python

While processing html using Beautifulsoup, the < and > were converted to &lt;and &gt;, since the tag anchor were all converted, the whole soup lost its structure, any suggestion?

like image 931
flyingfoxlee Avatar asked Feb 03 '13 03:02

flyingfoxlee


People also ask

Is it change to or change into?

If X changes into Y or something/somebody changes X into Y, X and Y are the same entity, but its nature is transformed. But if you change something from A to B or something changes from A to B, A and B are two separate entities or qualities, and A is discarded and replaced with B.

What is the other word for change?

1 transmute, transform; vary, mutate; amend, modify. 3 replace, swap. 4 trade. 7 convert.

Has changed to meaning?

“He has changed” means he is different from what he was before. “Has changed” is the verb in the present perfect tense. “He has had a changed attitude.” “Changed” is an adjective in this case.

Is it changed or change?

When you're talking about the act of changing, you have to say "It has changed" (and you're talking about the time that it changed). But if you say "It is changed", you are talking about the state after the act of changing. For example: The policy has changed (referring to the time that it changed).


1 Answers

Setting formatter=None may help (http://www.crummy.com/software/BeautifulSoup/bs4/doc/#output-formatters), but this might be an indication that your HTML is invalid.

If that doesn't work, can you provide some sample code and HTML which reproduces the problem?

like image 145
rkday Avatar answered Oct 09 '22 20:10

rkday