Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting elements only if have two classes and share the same first one

i have these elements in the HTML I want to parse:

<td class="line"> GARBAGE </td>
<td class="line text"> I WANT THAT </td>
<td class="line heading"> I WANT THAT </td>
<td class="line"> GARBAGE </td>

How can I make a CSS selector that select elements with attributes class line and class something else (could be heading, text or anything else) BUT not attribute class line only?

I have tried:

 td[class=line.*]
 td.line.*
 td[class^=line.]

EDIT

I am using Python and BeautifulSoup:

url = 'http://www.somewebsite'
res = requests.get(url)
res.raise_for_status()
DicoSoup = bs4.BeautifulSoup(res.text, "lxml")
elems = DicoSoup.select('body div#someid tr td.line')

I am looking into modifying the last piece, namely td.line to something like td.line.whateverotherclass (but not td.line alone otherwise my selector would suffice already)

like image 312
Mth Clv Avatar asked Oct 19 '22 03:10

Mth Clv


1 Answers

What @BoltClock suggested is generally a correct way to approach the problem with CSS selectors. The only problem is that BeautifulSoup supports a limited number of CSS selectors. For instance, not() selector is :not(.supported) at the moment.

You can workaround it with a "starts-with" selector to check if a class starts with line followed by a space (it is quite fragile but works on your sample data):

for td in soup.select("td[class^='line ']"):
    print(td.get_text(strip=True))

Or, you can solve it using the find_all() and having a searching function checking the class attribute to have line and some other class:

from bs4 import BeautifulSoup

data = """
<table>
    <tr>
        <td class="line"> GARBAGE </td>
        <td class="line text"> I WANT THAT </td>
        <td class="line heading"> I WANT THAT </td>
        <td class="line"> GARBAGE </td>
    </tr>
</table>"""
soup = BeautifulSoup(data, 'html.parser')

for td in soup.find_all(lambda tag: tag and tag.name == "td" and
                                    "class" in tag.attrs and "line" in tag["class"] and
                                    len(tag["class"]) > 1):
    print(td.get_text(strip=True))

Prints:

I WANT THAT
I WANT THAT
like image 74
alecxe Avatar answered Oct 21 '22 05:10

alecxe