Need to find text with RegEx and BeautifulSoup

Question

I'm trying to parse a website to pull out some data that is stored in the body such as this:

<body>
    <b>INFORMATION</b>
    Hookups: None
    Group Sites: No
    Station: No

    <b>Details</b>
    Ramp: Yes
</body>

I would like to use BeautifulSoup4 and RegEx to pull out the values for Hookups and Group Sites and so on, but I am new to both bs4 and RegEx. I have tried the following to get the Hookups Value:

soup = BeautifulSoup(open('doc.html'))
hookups = soup.find_all(re.compile("Hookups:(.*)Group"))

But the search comes back empty.

Explosion Pills · Accepted Answer

BeautifulSoup's find_all only works with tags. You can actually use just a pure regex to get what you need assuming the HTML is this simple. Otherwise you can use find_all and then get the .text nodes.

re.findall("Hookups: (.*)", open('doc.html').read())

You can also search by tag content with the text property as of BeautifulSoup 4.2

soup.find_all(text=re.compile("Hookups:(.*)Group"));

EDIT: Since BeautifulSoup 4.4, the text argument is named string.

Need to find text with RegEx and BeautifulSoup

Tags:

python

regex

beautifulsoup

python-2.7

web-scraping

bcoop713

1 Answers

Explosion Pills

Recent Activity

Donate For Us

Need to find text with RegEx and BeautifulSoup

Tags:

python

regex

beautifulsoup

python-2.7

web-scraping

bcoop713

1 Answers

Explosion Pills

Related questions

Recent Activity

Donate For Us