Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Beautifulsoup Find_all except

I'm struggling to find a simple to solve this problem and hope you might be able to help.

I've been using Beautifulsoup's find all and trying some regex to find all the items except the 'emptyLine' line in the html below:

<div class="product_item0 ">...</div>
<div class="product_item1 ">...</div>
<div class="product_item2 ">...</div>
<div class="product_item0 ">...</div>
<div class="product_item1 ">...</div>
<div class="product_item2 ">...</div>
<div class="product_item0 ">...</div>
<div class="product_item1 last">...</div>
<div class="product_item2 emptyItem">...</div>

Is there a simple way to find all the items except one including the 'emptyItem'?

like image 974
blountdj Avatar asked Jan 31 '16 15:01

blountdj


People also ask

How do you exclude a tag in BeautifulSoup?

You can use extract() to remove unwanted tag before you get text. But it keeps all '\n' and spaces so you will need some work to remove them. You can skip every Tag object inside external span and keep only NavigableString objects (it is plain text in HTML). extract() works but only if u have only one unwanted.

What is Find () method in BeautifulSoup?

find() method The find method is used for finding out the first tag with the specified name or id and returning an object of type bs4. Example: For instance, consider this simple HTML webpage having different paragraph tags.

What BeautifulSoup 4?

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

Is comment an object of BeautifulSoup?

The four major and important objects are :Comments.


1 Answers

Just skip elements containing the emptyItem class. Working sample:

from bs4 import BeautifulSoup

data = """
<div>
    <div class="product_item0">test0</div>
    <div class="product_item1">test1</div>
    <div class="product_item2">test2</div>
    <div class="product_item2 emptyItem">empty</div>
</div>
"""

soup = BeautifulSoup(data, "html.parser")

for elm in soup.select("div[class^=product_item]"):
    if "emptyItem" in elm["class"]:  # skip elements having emptyItem class
        continue

    print(elm.get_text())

Prints:

test0
test1
test2

Note that the div[class^=product_item] is a CSS selector that would match all div elements with a class starting with product_item.

like image 128
alecxe Avatar answered Sep 28 '22 03:09

alecxe