Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find "a" element in BS4 by partial class name not working?

I want to find an a element in a soup object by a substring present in its class name. This particular element will always have JobTitle inside the class name, with random preceding and trailing characters, so I need to locate it by its substring of JobTitle.

You can see the element here:

htmlToParse

It's safe to assume there is only 1 a element to find, so using find should work, however my attempts (there have been more than the 2 shown below) have not worked. I've also included the top elements in case it's relevant for location for some reason.

I'm on Windows 10, Python 3.10.5, and BS4 4.11.1.

I've created a reproducible example below (I thought the regex way would have worked, but I guess not):

import re
from bs4 import BeautifulSoup

# Parse this HTML, getting the only a['href'] in it (line 22)
html_to_parse = """
    <li>
        <div class="cardOutline tapItem fs-unmask result job_5ef6bf779263a83c sponsoredJob resultWithShelf sponTapItem desktop vjs-highlight">
            <div class="slider_container css-g7s71f eu4oa1w0">
            <div class="slider_list css-kyg8or eu4oa1w0">
                <div class="slider_item css-kyg8or eu4oa1w0">
                <div class="job_seen_beacon">
                    <div class="fe_logo">
                    <img alt="CyberCoders logo" class="feLogoImg desktop" src="https://d2q79iu7y748jz.cloudfront.net/s/_squarelogo/256x256/f0b43dcaa7850e2110bc8847ebad087b" />
                    </div>
                    <table cellpadding="0" cellspacing="0" class="jobCard_mainContent big6_visualChanges" role="presentation">
                    <tbody>
                        <tr>
                        <td class="resultContent">
                            <div class="css-1xpvg2o e37uo190">
                            <h2 class="jobTitle jobTitle-newJob css-bdjp2m eu4oa1w0" tabindex="-1">
                                <a aria-label="full details of REMOTE Senior Python Developer" class="jcs-JobTitle css-jspxzf eu4oa1w0" data-ci="385558680" data-empn="8690912762161442" data-hide-spinner="true" data-hiring-event="false" data-jk="5ef6bf779263a83c" data-mobtk="1g9u19rmn2ea6000" data-tu="https://jsv3.recruitics.com/partner/a51b8de1-f7bf-11e7-9edd-d951492604d9.gif?client=521&amp;rx_c=&amp;rx_campaign=indeed16&amp;rx_group=110383&amp;rx_source=Indeed&amp;job=KE2-168714218&amp;rx_r=none&amp;rx_ts=20220808T034442Z&amp;rx_pre=1&amp;indeed=sp" href="/pagead/clk?mo=r&amp;ad=-6NYlbfkN0CpFJQzrgRR8WqXWK1qKKEqALWJw739KlKqr2H-MSI4eoBlI4EFrmor2FYZMP3muM35UEpv7D8dnBwRFuIf8XmtgYykaU5Nl3fSsXZ8xXiGdq3dZVwYJYR2-iS1SqyS7j4jGQ4Clod3n72L285Zn7LuBKMjFoBPi4tB5X2mdRnx-UikeGviwDC-ahkoLgSBwNaEmvShQxaFt_IoqJP6OlMtTd7XlgeNdWJKY9Ph9u8n4tcsN_tCjwIc3RJRtS1O7U0xcsVy5Gi1JBR1W7vmqcg5n4WW1R_JnTwQQ8LVnUF3sDzT4IWevccQb289ocL5T4jSfRi7fZ6z14jrR6bKwoffT6ZMypqw4pXgZ0uvKv2v9m3vJu_e5Qit1D77G1lNCk9jWiUHjWcTSYwhhwNoRzjAwd4kvmzeoMJeUG0gbTDrXFf3V2uJQwjZhTul-nbfNeFPRX6vIb4jgiTn4h3JVq-zw0woq3hTrLq1z9Xpocf5lIGs9U7WJnZM-Mh7QugzLk1yM3prCk7tQYRl3aKrDdTsOdbl5Afs1DkatDI7TgQgFrr5Iauhiv7I9Ss-fzPJvezhlYR4hjkkmSSAKr3Esz06bh5GlZKFONpq1I0IG5aejSdS_kJUhnQ1D4Uj4x7X_mBBN-fjQmL_CdyWM1FzNNK0cZwdLjKL-d8UK1xPx3MS-O-WxVGaMq0rn4lyXgOx7op9EHQ2Qdxy9Dbtg6GNYg5qBv0iDURQqi7_MNiEBD-AaEyqMF3riCBJ4wQiVaMjSTiH_DTyBIsYc0UsjRGG4a949oMHZ8yL4mGg57QUvvn5M_urCwCtQTuyWZBzJhWFmdtcPKCn7LpvKTFGQRUUjsr6mMFTQpA0oCYSO7E-w2Kjj0loPccA9hul3tEwQm1Eh58zHI7lJO77kseFQND7Zm9OMz19oN45mvwlEgHBEj4YcENhG6wdB6M5agUoyyPm8fLCTOejStoecXYnYizm2tGFLfqNnV-XtyDZNV_sQKQ2TQ==&amp;xkcb=SoD0-_M3b-KooEWCyR0LbzkdCdPP&amp;p=0&amp;fvj=0&amp;vjs=3" id="sj_5ef6bf779263a83c" role="button" target="_blank">
                                <span id="jobTitle-5ef6bf779263a83c" title="REMOTE Senior Python Developer">REMOTE Senior Python Developer</span>
                                </a>
                            </h2>
                            </div>
                        </td>
                        </tr>
                    </tbody>
                    </table>
                </div>
                </div>
            </div>
            </div>
        </div>
    </li>
"""

# Soupify it
soup = BeautifulSoup(html_to_parse, "html.parser")

# Start by making sure "find_all("a")" works
all_links = soup.find_all("a")
print(all_links)
# Good.

# Attempt 1
job_url = soup.find('a[class*="JobTitle"]').a['href']
print(job_url)
# Nope.

# Attempt 2
job_url = soup.find("a", {"class": re.compile("^.*jobTitle.*")}).a['href']
print(job_url)
# Nope...
like image 462
Matt Wilson Avatar asked Oct 30 '25 19:10

Matt Wilson


2 Answers

To find an element with partial class name you need to use select, not find. The will give you the <a> tag, the href will be in it

job_url = soup.select_one('a[class*="JobTitle"]')['href']
print(job_url)
# /pagead/clk?mo=r&ad=-6NYlbfkN0CpFJQzrgRR8WqXWK1qKKEqALWJw739KlKqr2H-MSI4eoBlI4EFrmor2FYZMP3muM35UEpv7D8dnBwRFuIf8XmtgYykaU5Nl3fSsXZ8xXiGdq3dZVwYJYR2-iS1SqyS7j4jGQ4Clod3n72L285Zn7LuBKMjFoBPi4tB5X2mdRnx-UikeGviwDC-ahkoLgSBwNaEmvShQxaFt_IoqJP6OlMtTd7XlgeNdWJKY9Ph9u8n4tcsN_tCjwIc3RJRtS1O7U0xcsVy5Gi1JBR1W7vmqcg5n4WW1R_JnTwQQ8LVnUF3sDzT4IWevccQb289ocL5T4jSfRi7fZ6z14jrR6bKwoffT6ZMypqw4pXgZ0uvKv2v9m3vJu_e5Qit1D77G1lNCk9jWiUHjWcTSYwhhwNoRzjAwd4kvmzeoMJeUG0gbTDrXFf3V2uJQwjZhTul-nbfNeFPRX6vIb4jgiTn4h3JVq-zw0woq3hTrLq1z9Xpocf5lIGs9U7WJnZM-Mh7QugzLk1yM3prCk7tQYRl3aKrDdTsOdbl5Afs1DkatDI7TgQgFrr5Iauhiv7I9Ss-fzPJvezhlYR4hjkkmSSAKr3Esz06bh5GlZKFONpq1I0IG5aejSdS_kJUhnQ1D4Uj4x7X_mBBN-fjQmL_CdyWM1FzNNK0cZwdLjKL-d8UK1xPx3MS-O-WxVGaMq0rn4lyXgOx7op9EHQ2Qdxy9Dbtg6GNYg5qBv0iDURQqi7_MNiEBD-AaEyqMF3riCBJ4wQiVaMjSTiH_DTyBIsYc0UsjRGG4a949oMHZ8yL4mGg57QUvvn5M_urCwCtQTuyWZBzJhWFmdtcPKCn7LpvKTFGQRUUjsr6mMFTQpA0oCYSO7E-w2Kjj0loPccA9hul3tEwQm1Eh58zHI7lJO77kseFQND7Zm9OMz19oN45mvwlEgHBEj4YcENhG6wdB6M5agUoyyPm8fLCTOejStoecXYnYizm2tGFLfqNnV-XtyDZNV_sQKQ2TQ==&xkcb=SoD0-_M3b-KooEWCyR0LbzkdCdPP&p=0&fvj=0&vjs=3
like image 105
Guy Avatar answered Nov 02 '25 09:11

Guy


The CSS selector only works with the .select() method.

See documentation here. https://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors

Change your code to something like

job_links = soup.select('a[class*="JobTitle"]')
print(job_links)
for job_link in job_links:
    print(job_link.get("href"))
like image 36
fooiey Avatar answered Nov 02 '25 08:11

fooiey



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!