Please consider the following python session: <pre class="prettyprint"><code>>>> from BeautifulSoup import BeautifulSoup >>> s = BeautifulSoup("This is a test."); myi = s.find("i") >>> myi.replaceWith(BeautifulSoup("was")) >>> s.find("i") >>> s = BeautifulSoup("This is a test."); myi = s.find("i") >>> myi.replaceWith("was") >>> s.find("i") test </code></pre> Please note the missing output of s.find("i") after line 4! What's the reason for this? Is there a workaround? EDIT: Actually, the example doesn't demonstrate the usecase, which is: <pre class="prettyprint"><code>myi.replaceWith(BeautifulSoup("was")) </code></pre> Whenever the inserted part contains itself nontrivial html code, I don't see how you could replace this syntax with something else. Just having <pre class="prettyprint"><code>myi.replaceWith("was") </code></pre> will replace the html special chars by entities.

Simpler answer : after your call to <code>replaceWith</code>, regenerate and clean <code>s</code> by calling <code>s = BeautifulSoup(s.renderContents())</code>. Then you can <code>find</code> again.

The problem seems to be that a <code>BeautifulSoup</code> object is considered an entire document. <code>find</code> iterates through the document asking each element for the next element after it. But when it gets to your <code>BeautifulSoup("was")</code>, that object thinks it is the whole document, so it says there is nothing after it. This aborts the search too early. I don't think BeautifulSoup is designed to have BeautifulSoup objects inside other BeautifulSoup objects. The workaround is don't do that. Why do you feel you need to use the first form instead of the second one, which already works? If you want to replace an element with some bit of HTML, use a <code>Tag</code> for your replacement, not a <code>BeautifulSoup</code> object.

I think, I found a workaround, which solves the issue for me. I repeat the whole code again as a Python script to give a complete example: <pre class="prettyprint"><code>from BeautifulSoup import BeautifulSoup s = BeautifulSoup("This is a test.") myi = s.find("i") s2 = BeautifulSoup("was") myi_id = myi.parent.contents.index(myi) for c in reversed(s2.contents): myi.parent.insert(myi_id + 1, c) myi.extract() </code></pre> Please note, that this won't work without <code>reversed()</code>. If you skip it, you don't only change the order of the elements. If you really want the order to be changed, you will have to write the following: <pre class="prettyprint"><code>for c in list(s2.contents): myi.parent.insert(myi_id + 1, c) </code></pre> Can somebody please explain, why skipping <code>list()</code> will omit <code>s</code>? (Please answer in a comment, because this is not the main question here.)

find() after replaceWith() doesn't work (using BeautifulSoup)

Tags:

python

find

beautifulsoup

Please consider the following python session:

>>> from BeautifulSoup import BeautifulSoup
>>> s = BeautifulSoup("<p>This <i>is</i> a <i>test</i>.</p>"); myi = s.find("i")
>>> myi.replaceWith(BeautifulSoup("was"))
>>> s.find("i")
>>> s = BeautifulSoup("<p>This <i>is</i> a <i>test</i>.</p>"); myi = s.find("i")
>>> myi.replaceWith("was")
>>> s.find("i")
<i>test</i>

Please note the missing output of s.find("i") after line 4!

What's the reason for this? Is there a workaround?

EDIT: Actually, the example doesn't demonstrate the usecase, which is:

myi.replaceWith(BeautifulSoup("wa<b>s</b>"))

Whenever the inserted part contains itself nontrivial html code, I don't see how you could replace this syntax with something else. Just having

myi.replaceWith("wa<b>s</b>")

will replace the html special chars by entities.

915

asked Mar 16 '13 21:03

thomas

3 Answers

Simpler answer : after your call to replaceWith, regenerate and clean s by calling s = BeautifulSoup(s.renderContents()). Then you can find again.

answered Oct 07 '22 06:10

Steve K

The problem seems to be that a BeautifulSoup object is considered an entire document. find iterates through the document asking each element for the next element after it. But when it gets to your BeautifulSoup("was"), that object thinks it is the whole document, so it says there is nothing after it. This aborts the search too early.

I don't think BeautifulSoup is designed to have BeautifulSoup objects inside other BeautifulSoup objects. The workaround is don't do that. Why do you feel you need to use the first form instead of the second one, which already works? If you want to replace an element with some bit of HTML, use a Tag for your replacement, not a BeautifulSoup object.

answered Oct 07 '22 05:10

BrenBarn

I think, I found a workaround, which solves the issue for me. I repeat the whole code again as a Python script to give a complete example:

from BeautifulSoup import BeautifulSoup
s = BeautifulSoup("<p>This <i>is</i> a <i>test</i>.</p>")
myi = s.find("i")
s2 = BeautifulSoup("wa<b>s</b>")
myi_id = myi.parent.contents.index(myi)
for c in reversed(s2.contents):
    myi.parent.insert(myi_id + 1, c)
myi.extract()

Please note, that this won't work without reversed(). If you skip it, you don't only change the order of the elements. If you really want the order to be changed, you will have to write the following:

for c in list(s2.contents):
    myi.parent.insert(myi_id + 1, c)

Can somebody please explain, why skipping list() will omit s? (Please answer in a comment, because this is not the main question here.)

answered Oct 07 '22 06:10

thomas

Related questions
                            
                                tweepy stops after a few hours
                            
                                Need more than 32 USB sound cards on my system [closed]
                            
                                Django form to query database (models)
                            
                                Binary Tree in Python
                            
                                django prevent delete of model instance
                            
                                Reductions down a column in Pandas
                            
                                Django: Can't change default language
                            
                                Visualize a clickable graph in an HTML page
                            
                                How to get orthogonal distances of vectors from plane in Numpy/Scipy?
                            
                                How to register new client on Instagram API
                            
                                Is there a more elegant pythonic way of expressing the following condional expression?
                            
                                Python: split list of integers based on step between them
                            
                                How to use Python left outer join using FOR/LIST/DICTIONARY comprehensions (not SQL)?
                            
                                Python (numpy): drop columns by index
                            
                                Installation of biopython - python 3.3 not found in registry
                            
                                Python: access objects from another module
                            
                                How to run Python from Windows cmd [duplicate]
                            
                                Python convert Excel File (xls or xlsx) to/from ODS
                            
                                scikit-learn, add features to a vectorized set of documents
                            
                                numpy -- Transform non-contiguous data to contiguous data in place

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

find() after replaceWith() doesn't work (using BeautifulSoup)

Tags:

python

find

beautifulsoup

thomas

People also ask

3 Answers

Steve K

BrenBarn

thomas

Recent Activity

Donate For Us