I have to copy a part of one document to another, but I don't want to modify the document I copy from. If I use <code>.extract()</code> it removes the element from the tree. If I just append selected element like <code>document2.append(document1.tag)</code> it still removes the element from document1. As I use real files I can just not save document1 after modification, but is there any way to do this without corrupting a document?

It may not be the fastest solution, but it is short and seems to work... <code>clonedtag = BeautifulSoup(str(sourcetag)).body.contents[0]</code> BeautifulSoup creates an extra <code><html><body>...</body></html></code> around the cloned tag (in order to make the "soup" a sane html document). <code>.body.contents[0]</code> removes those wrapping tags. This idea was derived Peter Woods comment above and Clemens Klein-Robbenhaar's comment below.

For Python: You can copy the parent element like: <pre class="prettyprint"><code>import copy p_copy = copy.copy(soup.p) print p_copy # I want pizza and more pizza! </code></pre> Ref: https://www.crummy.com/software/BeautifulSoup/bs4/doc/ Section: Copying Beautiful Soup objects Regards.

clone element with beautifulsoup

3 Answers

There is no native clone function in BeautifulSoup in versions before 4.4 (released July 2015); you'd have to create a deep copy yourself, which is tricky as each element maintains links to the rest of the tree.

To clone an element and all its elements, you'd have to copy all attributes and reset their parent-child relationships; this has to happen recursively. This is best done by not copying the relationship attributes and re-seat each recursively-cloned element:

Click to copy

from bs4 import Tag, NavigableString

def clone(el):
    if isinstance(el, NavigableString):
        return type(el)(el)

    copy = Tag(None, el.builder, el.name, el.namespace, el.nsprefix)
    # work around bug where there is no builder set
    # https://bugs.launchpad.net/beautifulsoup/+bug/1307471
    copy.attrs = dict(el.attrs)
    for attr in ('can_be_empty_element', 'hidden'):
        setattr(copy, attr, getattr(el, attr))
    for child in el.contents:
        copy.append(clone(child))
    return copy

This method is kind-of sensitive to the current BeautifulSoup version; I tested this with 4.3, future versions may add attributes that need to be copied too.

You could also monkeypatch this functionality into BeautifulSoup:

Click to copy

from bs4 import Tag, NavigableString


def tag_clone(self):
    copy = type(self)(None, self.builder, self.name, self.namespace, 
                      self.nsprefix)
    # work around bug where there is no builder set
    # https://bugs.launchpad.net/beautifulsoup/+bug/1307471
    copy.attrs = dict(self.attrs)
    for attr in ('can_be_empty_element', 'hidden'):
        setattr(copy, attr, getattr(self, attr))
    for child in self.contents:
        copy.append(child.clone())
    return copy


Tag.clone = tag_clone
NavigableString.clone = lambda self: type(self)(self)

letting you call .clone() on elements directly:

Click to copy

document2.body.append(document1.find('div', id_='someid').clone())

My feature request to the BeautifulSoup project was accepted and tweaked to use the copy.copy() function; now that BeautifulSoup 4.4 is released you can use that version (or newer) and do:

Click to copy

import copy

document2.body.append(copy.copy(document1.find('div', id_='someid')))

answered Oct 17 '22 09:10

Martijn Pieters

It may not be the fastest solution, but it is short and seems to work...

clonedtag = BeautifulSoup(str(sourcetag)).body.contents[0]

BeautifulSoup creates an extra <html><body>...</body></html> around the cloned tag (in order to make the "soup" a sane html document). .body.contents[0] removes those wrapping tags.

This idea was derived Peter Woods comment above and Clemens Klein-Robbenhaar's comment below.

answered Oct 17 '22 08:10

andrew pate

For Python:

You can copy the parent element like:

Click to copy

import copy
p_copy = copy.copy(soup.p)
print p_copy
# <p>I want <b>pizza</b> and more <b>pizza</b>!</p>

Ref: https://www.crummy.com/software/BeautifulSoup/bs4/doc/ Section: Copying Beautiful Soup objects

Regards.

answered Oct 17 '22 09:10

da7oom

Related questions
                            
                                How to read data in chunks in Python dataframe?
                            
                                Mean of values in a column for unique values in another column
                            
                                "OSError: cannot identify image file" opening image with PIL/Image
                            
                                How to get the specific C compiler type from Python distutils?
                            
                                xarray.Dataset.where() method force-changes dtype of DataArrays to float
                            
                                Output size of convolutional auto-encoder in Keras
                            
                                Handling a timeout exception in Python
                            
                                Shifting Column to the left Pandas Dataframe
                            
                                How to row-wise concatenate several columns containing strings?
                            
                                numpy binary notation quick generation
                            
                                Pymongo replace_one modified_count always 1 even if not changing anything
                            
                                Word2Vec: Using Gensim and Google-News dataset- Very Slow Execution Time
                            
                                PyInstaller lib not found
                            
                                python-pptx: insert picture into content placeholder
                            
                                python pandas groupby sorting and concatenating
                            
                                how to halt python program after pdb.set_trace()
                            
                                Convert List of List of Tuples Into 2d Numpy Array
                            
                                Using Dictionary get method to return empty list by default returns None instead
                            
                                Cython & C++: passing by reference
                            
                                normalize non-existing path using pathlib only

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

clone element with beautifulsoup

Tags:

python

beautifulsoup

Anton Vernigor

People also ask

3 Answers

Martijn Pieters

andrew pate

da7oom

Recent Activity

Donate For Us