Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Show text inside the tags BeautifulSoup

I'm trying to show only the text inside the tag, for example:

<span class="listing-row__price ">$71,996</span>

I want to only show

"$71,996"

My code is:

import requests
from bs4 import BeautifulSoup
from csv import writer

response = requests.get('https://www.cars.com/for-sale/searchresults.action/?mdId=21811&mkId=20024&page=1&perPage=100&rd=99999&searchSource=PAGINATION&showMore=false&sort=relevance&stkTypId=28880&zc=11209')

soup = BeautifulSoup(response.text, 'html.parser')

cars = soup.find_all('span', attrs={'class': 'listing-row__price'})
print(cars)

How can I extract the text from the tags?

like image 899
R K Avatar asked Mar 05 '19 01:03

R K


3 Answers

To get the text within the tags, there are a couple of approaches,

a) Use the .text attribute of the tag.

cars = soup.find_all('span', attrs={'class': 'listing-row__price'})
for tag in cars:
    print(tag.text.strip())

Output

$71,996
$75,831
$71,412
$75,476
....

b) Use get_text()

for tag in cars:
    print(tag.get_text().strip())

c) If there is only that string inside the tag, you can use these options also

  • .string
  • .contents[0]
  • next(tag.children)
  • next(tag.strings)
  • next(tag.stripped_strings)

ie.

for tag in cars:
    print(tag.string.strip()) #or uncomment any of the below lines
    #print(tag.contents[0].strip())
    #print(next(tag.children).strip())
    #print(next(tag.strings).strip())
    #print(next(tag.stripped_strings))

Outputs:

$71,996
$75,831
$71,412
$75,476
$77,001
...

Note:

.text and .string are not the same. If there are other elements in the tag, .string returns the None, while .text will return the text inside the tag.

from bs4 import BeautifulSoup
html="""
<p>hello <b>there</b></p>
"""
soup = BeautifulSoup(html, 'html.parser')
p = soup.find('p')
print(p.string)
print(p.text)

Outputs

None
hello there
like image 167
Bitto Bennichan Avatar answered Oct 14 '22 18:10

Bitto Bennichan


print( [x.text for x in cars] )

like image 45
C8H10N4O2 Avatar answered Oct 14 '22 18:10

C8H10N4O2


Actually the request not returning any response. As I see, response code is 500 which means network issue and you are not getting any data.

What you are missing is user-agent which you need to send in headers along with request.

import requests
import re #regex library
from bs4 import BeautifulSoup

headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36"
}

crawl_url = 'https://www.cars.com/for-sale/searchresults.action/?mdId=21811&mkId=20024&page=1&perPage=100&rd=99999&searchSource=PAGINATION&showMore=false&sort=relevance&stkTypId=28880&zc=11209'
response = requests.get(crawl_url, headers=headers )


cars = soup.find_all('span', attrs={'class': 'listing-row__price'})

for car in cars:
    print(re.sub(r'\s+', '', ''.join([car.text])))

output

$71,412  
$75,476  
$77,001  
$77,822  
$107,271 
...
like image 38
Pankaj Avatar answered Oct 14 '22 19:10

Pankaj