I'd like to understand how to extract a h1 tag text which contains many others tags in it using beautiful soup : <div class="snippet" data-lang="js" data-hide="false" data-console="true" data-babel="false"> <div class="snippet-code"> <pre class="prettyprint snippet-code-html lang-html prettyprint-override"><code><h1 class="listing-name"> Hôtel Vevey <span class="entry-feedbacks-summary-title-rating-stars entry-feedbacks-summary-title-rating-stars-empty" data-container=".entry-feedbacks-summary-title-rating-stars-container" data-content="Il n'y a pas encore d'avis de clients à propos de Astra Hôtel Vevey 4*sup. Cliquez pour évaluer." data-placement="right" data-toggle="popover" data-trigger="hover" data-original-title="" title=""> <a class="feedback-login-link entry-feedbacks-header-link" href="/auth/localch?origin=https%3A%2F%2Ftel.local.ch%2Ffr%2Fd%2FVevey%2F1800%2FHotel%2FAstra-Hotel-Vevey-4sup-SVGb8b5z-QdrzGTddmyAAg%3Fwhat%3DHotel%26where%3DVaud%2B%2528Canton%2529%23entry-feedbacks-bottom-rate-button"> </a> </h1></code></pre> </div> </div> I'm trying to extract the text JUST after the h1 tag "hôtel Vevey". <pre class="prettyprint"><code>import requests from bs4 import BeautifulSoup url = "https://tel.local.ch/fr/d/Vevey/1800/Hotel/Astra-Hotel-Vevey-4sup-SVGb8b5z-QdrzGTddmyAAg?what=Hotel&where=Vaud+%28Canton%29" get_url = requests.get(url) get_text = get_url.text soup = BeautifulSoup(get_text, "html.parser") company = soup.find_next('h1', 'class:listing-name') print(company) </code></pre> It returns me "none"

For the current link that you have provided you can get it like this: <pre class="prettyprint"><code>company = soup.select('h1.listing-name')[0].text.strip() print(company) </code></pre> Output: <pre class="prettyprint"><code>Astra Hôtel Vevey 4*sup </code></pre>

Try using a dictionary: <pre class="prettyprint"><code>company = soup.find('h1', {'class' : 'listing-name'}) </code></pre> Or the following: <pre class="prettyprint"><code>company = soup.find('h1', class_ ='listing-name') </code></pre> Note the underscore after class. This is because class is a reserved word in python. More info can be found here: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#attrs

How to extract h1 tag text with beautifulsoup

Tags:

python

beautifulsoup

I'd like to understand how to extract a h1 tag text which contains many others tags in it using beautiful soup :

<h1 class="listing-name">
Hôtel Vevey 
<span class="entry-feedbacks-summary-title-rating-stars-container bootstrap">
<span class="entry-feedbacks-summary-title-rating-stars entry-feedbacks-summary-title-rating-stars-empty" data-container=".entry-feedbacks-summary-title-rating-stars-container" data-content="Il n'y a pas encore d'avis de clients à propos de Astra Hôtel Vevey 4*sup. Cliquez pour évaluer." data-placement="right" data-toggle="popover" data-trigger="hover" data-original-title="" title="">
<a class="feedback-login-link entry-feedbacks-header-link" href="/auth/localch?origin=https%3A%2F%2Ftel.local.ch%2Ffr%2Fd%2FVevey%2F1800%2FHotel%2FAstra-Hotel-Vevey-4sup-SVGb8b5z-QdrzGTddmyAAg%3Fwhat%3DHotel%26where%3DVaud%2B%2528Canton%2529%23entry-feedbacks-bottom-rate-button"><span class="entry-feedback-rating-star">
<i class="icon-star-outline entry-feedback-rating-star-empty"></i>
</span>
<span class="entry-feedback-rating-star">
<i class="icon-star-outline entry-feedback-rating-star-empty"></i>
</span>
<span class="entry-feedback-rating-star">
<i class="icon-star-outline entry-feedback-rating-star-empty"></i>
</span>
<span class="entry-feedback-rating-star">
<i class="icon-star-outline entry-feedback-rating-star-empty"></i>
</span>
<span class="entry-feedback-rating-star">
<i class="icon-star-outline entry-feedback-rating-star-empty"></i>
</span>

</a></span>

</span>
</h1>

I'm trying to extract the text JUST after the h1 tag "hôtel Vevey".

import requests
from bs4 import BeautifulSoup

url = "https://tel.local.ch/fr/d/Vevey/1800/Hotel/Astra-Hotel-Vevey-4sup-SVGb8b5z-QdrzGTddmyAAg?what=Hotel&where=Vaud+%28Canton%29"
get_url = requests.get(url)
get_text = get_url.text
soup = BeautifulSoup(get_text, "html.parser")

company = soup.find_next('h1', 'class:listing-name')


print(company)

It returns me "none"

552

asked Nov 21 '16 09:11

jjyoh

2 Answers

For the current link that you have provided you can get it like this:

company = soup.select('h1.listing-name')[0].text.strip()
print(company)

Output:

Astra Hôtel Vevey 4*sup

193

answered Sep 22 '22 20:09

Mohammad Yusuf

Try using a dictionary:

company = soup.find('h1', {'class' : 'listing-name'})

Or the following:

company = soup.find('h1', class_ ='listing-name')

Note the underscore after class. This is because class is a reserved word in python.

More info can be found here: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#attrs

answered Sep 26 '22 20:09

narko

Related questions
                            
                                Explain a code to check primality based on Fermat's little theorem
                            
                                TypeError: unsupported operand type(s) for ^: 'numpy.float64' and 'numpy.float64'
                            
                                Seaborn regplot with colorbar?
                            
                                python how to write list of lists to file
                            
                                How to insert values in mysql from loop in Python [duplicate]
                            
                                Concatenate two lists side by side
                            
                                Keep Django runserver alive when SSH is closed
                            
                                smallest negative int in Python [duplicate]
                            
                                How to make a bar plot of non-numerical data in pandas
                            
                                Show group members in Django Admin
                            
                                Django Table already exist
                            
                                Jupyter on EC2: SSL Error
                            
                                Python Split string in a certain length
                            
                                Rename pivoted and aggregated column in PySpark Dataframe
                            
                                Import Vlc module in python
                            
                                Resizing an EC2 instance using boto3 [closed]
                            
                                Incomplete Gamma function in scipy
                            
                                glob error <_io.TextIOWrapper name='...' mode='r' encoding='cp1252'> reading text file error
                            
                                How to find and leave only doubles in list python?
                            
                                how to find height and width of image for FileField Django

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With