I'm trying to grab all the headers from a simple website. My attempt:
from bs4 import BeautifulSoup, SoupStrainer
import requests
url = "http://nypost.com/business"
page = requests.get(url)
data = page.text
soup = BeautifulSoup(data)
soup.find_all('h')
soup.find_all('h')
returns []
, but if I do something like soup.h1
or soup.h2
, it returns that respective data. Am I just calling the method incorrectly?
Filter by regular expression:
soup.find_all(re.compile('^h[1-6]$'))
This regex finds all tags that start with h
, have a digit after the h
, and then end after the digit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With