How to grab all headers from a website using BeautifulSoup?

Question

I'm trying to grab all the headers from a simple website. My attempt:

from bs4 import BeautifulSoup, SoupStrainer
import requests

url = "http://nypost.com/business"
page = requests.get(url)
data = page.text
soup = BeautifulSoup(data)
soup.find_all('h')

soup.find_all('h') returns [], but if I do something like soup.h1 or soup.h2, it returns that respective data. Am I just calling the method incorrectly?

phd · Accepted Answer

Filter by regular expression:

soup.find_all(re.compile('^h[1-6]$'))

This regex finds all tags that start with h, have a digit after the h, and then end after the digit.

How to grab all headers from a website using BeautifulSoup?

Tags:

python

beautifulsoup

python-requests

web-scraping

hiimarksman

1 Answers

phd

Recent Activity

Donate For Us

How to grab all headers from a website using BeautifulSoup?

Tags:

python

beautifulsoup

python-requests

web-scraping

hiimarksman

1 Answers

phd

Related questions

Recent Activity

Donate For Us