Show text inside the tags BeautifulSoup

Question

I'm trying to show only the text inside the tag, for example:

<span class="listing-row__price ">$71,996</span>

I want to only show

"$71,996"

My code is:

import requests
from bs4 import BeautifulSoup
from csv import writer

response = requests.get('https://www.cars.com/for-sale/searchresults.action/?mdId=21811&mkId=20024&page=1&perPage=100&rd=99999&searchSource=PAGINATION&showMore=false&sort=relevance&stkTypId=28880&zc=11209')

soup = BeautifulSoup(response.text, 'html.parser')

cars = soup.find_all('span', attrs={'class': 'listing-row__price'})
print(cars)

How can I extract the text from the tags?

Bitto Bennichan · Accepted Answer

To get the text within the tags, there are a couple of approaches,

a) Use the .text attribute of the tag.

cars = soup.find_all('span', attrs={'class': 'listing-row__price'})
for tag in cars:
    print(tag.text.strip())

Output

$71,996
$75,831
$71,412
$75,476
....

b) Use get_text()

for tag in cars:
    print(tag.get_text().strip())

c) If there is only that string inside the tag, you can use these options also

.string
.contents[0]
next(tag.children)
next(tag.strings)
next(tag.stripped_strings)

ie.

for tag in cars:
    print(tag.string.strip()) #or uncomment any of the below lines
    #print(tag.contents[0].strip())
    #print(next(tag.children).strip())
    #print(next(tag.strings).strip())
    #print(next(tag.stripped_strings))

Outputs:

$71,996
$75,831
$71,412
$75,476
$77,001
...

Note:

.text and .string are not the same. If there are other elements in the tag, .string returns the None, while .text will return the text inside the tag.

from bs4 import BeautifulSoup
html="""
<p>hello <b>there</b></p>
"""
soup = BeautifulSoup(html, 'html.parser')
p = soup.find('p')
print(p.string)
print(p.text)

Outputs

None
hello there

C8H10N4O2 · Answer

print( [x.text for x in cars] )

Pankaj · Answer

Actually the request not returning any response. As I see, response code is 500 which means network issue and you are not getting any data.

What you are missing is user-agent which you need to send in headers along with request.

import requests
import re #regex library
from bs4 import BeautifulSoup

headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36"
}

crawl_url = 'https://www.cars.com/for-sale/searchresults.action/?mdId=21811&mkId=20024&page=1&perPage=100&rd=99999&searchSource=PAGINATION&showMore=false&sort=relevance&stkTypId=28880&zc=11209'
response = requests.get(crawl_url, headers=headers )


cars = soup.find_all('span', attrs={'class': 'listing-row__price'})

for car in cars:
    print(re.sub(r'\s+', '', ''.join([car.text])))

output

$71,412  
$75,476  
$77,001  
$77,822  
$107,271 
...

Show text inside the tags BeautifulSoup

Tags:

python

python-3.x

beautifulsoup

web-scraping

R K

3 Answers

Bitto Bennichan

C8H10N4O2

output

Pankaj

Recent Activity

Donate For Us

Show text inside the tags BeautifulSoup

Tags:

python

python-3.x

beautifulsoup

web-scraping

R K

3 Answers

Bitto Bennichan

C8H10N4O2

output

Pankaj

Related questions

Recent Activity

Donate For Us