Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract h1 tag text with beautifulsoup

I'd like to understand how to extract a h1 tag text which contains many others tags in it using beautiful soup :

<h1 class="listing-name">
Hôtel Vevey 
<span class="entry-feedbacks-summary-title-rating-stars-container bootstrap">
<span class="entry-feedbacks-summary-title-rating-stars entry-feedbacks-summary-title-rating-stars-empty" data-container=".entry-feedbacks-summary-title-rating-stars-container" data-content="Il n'y a pas encore d'avis de clients à propos de Astra Hôtel Vevey 4*sup. Cliquez pour évaluer." data-placement="right" data-toggle="popover" data-trigger="hover" data-original-title="" title="">
<a class="feedback-login-link entry-feedbacks-header-link" href="/auth/localch?origin=https%3A%2F%2Ftel.local.ch%2Ffr%2Fd%2FVevey%2F1800%2FHotel%2FAstra-Hotel-Vevey-4sup-SVGb8b5z-QdrzGTddmyAAg%3Fwhat%3DHotel%26where%3DVaud%2B%2528Canton%2529%23entry-feedbacks-bottom-rate-button"><span class="entry-feedback-rating-star">
<i class="icon-star-outline entry-feedback-rating-star-empty"></i>
</span>
<span class="entry-feedback-rating-star">
<i class="icon-star-outline entry-feedback-rating-star-empty"></i>
</span>
<span class="entry-feedback-rating-star">
<i class="icon-star-outline entry-feedback-rating-star-empty"></i>
</span>
<span class="entry-feedback-rating-star">
<i class="icon-star-outline entry-feedback-rating-star-empty"></i>
</span>
<span class="entry-feedback-rating-star">
<i class="icon-star-outline entry-feedback-rating-star-empty"></i>
</span>

</a></span>

</span>
</h1>

I'm trying to extract the text JUST after the h1 tag "hôtel Vevey".

import requests
from bs4 import BeautifulSoup

url = "https://tel.local.ch/fr/d/Vevey/1800/Hotel/Astra-Hotel-Vevey-4sup-SVGb8b5z-QdrzGTddmyAAg?what=Hotel&where=Vaud+%28Canton%29"
get_url = requests.get(url)
get_text = get_url.text
soup = BeautifulSoup(get_text, "html.parser")

company = soup.find_next('h1', 'class:listing-name')


print(company)

It returns me "none"

like image 552
jjyoh Avatar asked Nov 21 '16 09:11

jjyoh


People also ask

How do I print tags in BeautifulSoup?

In order to print all the heading tags using BeautifulSoup, we use the find_all() method. The find_all method is one of the most common methods in BeautifulSoup. It looks through a tag and retrieves all the occurrences of that tag.


2 Answers

For the current link that you have provided you can get it like this:

company = soup.select('h1.listing-name')[0].text.strip()
print(company)

Output:

Astra Hôtel Vevey 4*sup
like image 193
Mohammad Yusuf Avatar answered Sep 22 '22 20:09

Mohammad Yusuf


Try using a dictionary:

company = soup.find('h1', {'class' : 'listing-name'})

Or the following:

company = soup.find('h1', class_ ='listing-name')

Note the underscore after class. This is because class is a reserved word in python.

More info can be found here: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#attrs

like image 36
narko Avatar answered Sep 26 '22 20:09

narko