I'm trying to show only the text inside the tag, for example:
<span class="listing-row__price ">$71,996</span>
I want to only show
"$71,996"
My code is:
import requests
from bs4 import BeautifulSoup
from csv import writer
response = requests.get('https://www.cars.com/for-sale/searchresults.action/?mdId=21811&mkId=20024&page=1&perPage=100&rd=99999&searchSource=PAGINATION&showMore=false&sort=relevance&stkTypId=28880&zc=11209')
soup = BeautifulSoup(response.text, 'html.parser')
cars = soup.find_all('span', attrs={'class': 'listing-row__price'})
print(cars)
How can I extract the text from the tags?
To get the text within the tags, there are a couple of approaches,
a) Use the .text
attribute of the tag.
cars = soup.find_all('span', attrs={'class': 'listing-row__price'})
for tag in cars:
print(tag.text.strip())
Output
$71,996
$75,831
$71,412
$75,476
....
b) Use get_text()
for tag in cars:
print(tag.get_text().strip())
c) If there is only that string inside the tag, you can use these options also
.string
.contents[0]
next(tag.children)
next(tag.strings)
next(tag.stripped_strings)
ie.
for tag in cars:
print(tag.string.strip()) #or uncomment any of the below lines
#print(tag.contents[0].strip())
#print(next(tag.children).strip())
#print(next(tag.strings).strip())
#print(next(tag.stripped_strings))
Outputs:
$71,996
$75,831
$71,412
$75,476
$77,001
...
Note:
.text
and .string
are not the same. If there are other elements in the tag, .string
returns the None
, while .text will return the text inside the tag.
from bs4 import BeautifulSoup
html="""
<p>hello <b>there</b></p>
"""
soup = BeautifulSoup(html, 'html.parser')
p = soup.find('p')
print(p.string)
print(p.text)
Outputs
None
hello there
print( [x.text for x in cars] )
Actually the request
not returning any response
. As I see, response code is 500
which means network issue and you are not getting any data.
What you are missing is user-agent
which you need to send in headers
along with request
.
import requests
import re #regex library
from bs4 import BeautifulSoup
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36"
}
crawl_url = 'https://www.cars.com/for-sale/searchresults.action/?mdId=21811&mkId=20024&page=1&perPage=100&rd=99999&searchSource=PAGINATION&showMore=false&sort=relevance&stkTypId=28880&zc=11209'
response = requests.get(crawl_url, headers=headers )
cars = soup.find_all('span', attrs={'class': 'listing-row__price'})
for car in cars:
print(re.sub(r'\s+', '', ''.join([car.text])))
$71,412
$75,476
$77,001
$77,822
$107,271
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With