Python & Beautiful Soup: Searching only in a certain class

Question

I write a script to capture the independence date of few countries on Wikipedia.

For example, with the Kazakhstan:

URL_QS = 'https://en.wikipedia.org/wiki/Kazakhstan'
r = requests.get(URL_QS)
soup = BeautifulSoup(r.text, 'lxml')

# Only keep the infobox (top right)
infobox = soup.find("table", class_="infobox geography vcard")

if infobox:
    formation = infobox.find_next(text = re.compile("Formation"))

    if formation: 
        independence = formation.find_next(text = re.compile("independence")) 

        if independence:
            independ_date = independence.find_next("td").text
        else:
            independence = formation.find_next(text = re.compile("Independence"))

            if independence:
                independ_date = independence.find_next("td").text


print(independ_date)

And I have the following output:

Almaty

This output is not localised in the infobox but after, in the text. It's because "formation.find_next(text = re.compile("independence"))" found something outside of the infobox but I don't understand why the research should not be done only in the infobox ? How can I just search in this field ?

Thank you in advance for your help!

Nik Markin · Accepted Answer

It's because "formation.find_next(text = re.compile("independence"))" found something outside of the infobox

add .extract() to your soup.find() to search only inside the infobox geography vcard element.

infobox = soup.find("table", class_="infobox geography vcard").extract()

Python & Beautiful Soup: Searching only in a certain class

Tags:

python

beautifulsoup

jGsch

1 Answers

Nik Markin

Recent Activity

Donate For Us

Python & Beautiful Soup: Searching only in a certain class

Tags:

python

beautifulsoup

jGsch

1 Answers

Nik Markin

Related questions

Recent Activity

Donate For Us