Iterate through elements in html tree using BeautifulSoup, and produce an output that maintains the relative position of each element? in Python

Question

I have this code that does what I need it to do using Jsoup in Java

Elements htmlTree = doc.body().select("*");

    Elements menuElements = new Elements();

    for(Element element : htmlTree) {

        if(element.hasClass("header")) 
            menuElements.add(element);
        if(element.hasClass("name"))
            menuElements.add(element);
        if(element.hasClass("quantity"))
            menuElements.add(element);
    }

I want to do the same thing but in Python using BeautifulSoup. An example tree of the HTML I'm trying to scrape follows:

<div class="header"> content </div>
     <div class="name"> content </div>
     <div class="quantity"> content </div>
     <div class="name"> content </div>
     <div class="quantity"> content </div>
<div class="header"> content2 </div>
     <div class="name"> content2 </div>
     <div class="quantity"> content2 </div>
     <div class="name"> content2 </div>
     <div class="quantity"> content2 </div>

etc.

Basically I want the output to preserve the relative positions of each element. How would I got about doing that using Python and BeautifulSoup?

EDIT:

This is the python code I have (it's very naive) but maybe it can help?

output = []

for e in soup :
  if e["class"] == "pickmenucolmenucat" :
    output.append(e)
  if e["class"] == "pickmenucoldispname" :
    output.append(e)
  if e["class"] == "pickmenucolportions" :
    output.append(e)

jfs · Accepted Answer

To find all <div> elements that have class attribute from a given list:

#!/usr/bin/env python
from bs4 import BeautifulSoup # $ pip install beautifulsoup4

with open('input.xml', 'rb') as file:
    soup = BeautifulSoup(file)

elements = soup.find_all("div", class_="header name quantity".split())
print("
".join("{} {}".format(el['class'], el.get_text()) for el in elements))

Output

['header']  content 
['name']  content 
['quantity']  content 
['name']  content 
['quantity']  content 
['header']  content2 
['name']  content2 
['quantity']  content2 
['name']  content2 
['quantity']  content2

There are also other methods that allows you to search, traverse html elements.

Iterate through elements in html tree using BeautifulSoup, and produce an output that maintains the relative position of each element? in Python

Tags:

python

html-parsing

beautifulsoup

web-scraping

jsoup

Christian

1 Answers

Output

jfs

Recent Activity

Donate For Us

Iterate through elements in html tree using BeautifulSoup, and produce an output that maintains the relative position of each element? in Python

Tags:

python

html-parsing

beautifulsoup

web-scraping

jsoup

Christian

1 Answers

Output

jfs

Related questions

Recent Activity

Donate For Us