BeautifulSoup - Handing of cases where variable.find( ).string returns empty

Tags:

from bs4 import BeautifulSoup
import codecs
import sys

import urllib.request
site_response= urllib.request.urlopen("http://site/")
html=site_response.read()
file = open ("cars.html","wb") #open file in binary mode
file.write(html)
file.close()


soup = BeautifulSoup(open("cars.html"))
output = (soup.prettify('latin'))
#print(output) #prints whole file for testing

file_output = open ("cars_out.txt","wb")
file_output.write(output)
file_output.close()

fulllist=soup.find_all("div", class_="row vehicle")
#print(fulllist) #prints each row vehicle class for debug

for item in fulllist:
    item_print=item.find("span", class_="modelYearSort").string
    item_print=item_print + "|" + item.find("span", class_="mmtSort").string
    seller_phone=item.find("span", class_="seller-phone")
    print(seller_phone)
    # item_print=item_print + "|" + item.find("span", class_="seller-phone").string
    item_print=item_print + "|" + item.find("span", class_="priceSort").string
    item_print=item_print + "|" + item.find("span", class_="milesSort").string
    print(item_print)

I have the code above, it parses some html code and generates a pipe delineated file . it works fine except for there are a few entries where one of the elements (seller-phone) is missing from the html code. Not all entries have a seller phone number.

item.find("span", class_="seller-phone").string

I get a failure here. I am not surprised that line fails when seller-phone is missing. I get 'AttributeError' NoneType object has not attribute string.

I am able to do 'item.find' without the '.string' and get back the full block of html. But I can not figure out how to extract the text for those cases.

842

asked Dec 07 '13 13:12

personalt

1 Answers

You're correct, soup.find returns None if there's no element found.

You can just put an if/else clause to avoid this:

for item in fulllist:
    span = item.find("span", class_="modelYearSort")
    if span:
        item_print = span.string
        item_print=item_print + "|" + item.find("span", class_="mmtSort").string
        seller_phone=item.find("span", class_="seller-phone")
        print(seller_phone)
        # item_print=item_print + "|" + item.find("span", class_="seller-phone").string
        item_print=item_print + "|" + item.find("span", class_="priceSort").string
        item_print=item_print + "|" + item.find("span", class_="milesSort").string
        print(item_print)
    else:
        continue #It's empty, go on to the next loop.

Or if you like it, use a try/except block:

for item in fulllist:
    try:
        item_print=item.find("span", class_="modelYearSort").string
    except AttributeError:
        continue #skip to the next loop.
    else:
        item_print=item_print + "|" + item.find("span", class_="mmtSort").string
        seller_phone=item.find("span", class_="seller-phone")
        print(seller_phone)
        # item_print=item_print + "|" + item.find("span", class_="seller-phone").string
        item_print=item_print + "|" + item.find("span", class_="priceSort").string
        item_print=item_print + "|" + item.find("span", class_="milesSort").string
        print(item_print)

Hope this helps!

194

answered Oct 08 '22 07:10

aIKid

Related questions
                            
                                Reindexing and filling on one level of a hierarchical index in pandas
                            
                                How to tell scapy sniff() to stop if no packet is received?
                            
                                Non-Blocking error when adding timeout to python server
                            
                                Exposing global data and functions in Pyramid and Jinja 2 templating
                            
                                How does one create a struct from a numba struct type?
                            
                                np.array - too many values to unpack
                            
                                Ping MySQL to keep connection alive in Django
                            
                                hsv_to_rgb isn't the inverse of rgb_to_hsv on matplotlib
                            
                                Pycharm debugger much slower than normal run
                            
                                Does pybtex support accent/special characters in .bib file?
                            
                                python - ensure script is activated only once
                            
                                Parsing python with PLY, how to code the indent and dedent part
                            
                                Installing openCV with anaconda on ubuntu
                            
                                How to remove values on x,y axis on plot in matplotlib
                            
                                Pandas fuzzy merge/match name column, with duplicates
                            
                                Self-referencing inside class definition
                            
                                How to index a foreign key CharField using Haystack/Whoosh with Django?
                            
                                Can one declare an abstract exception in Python?
                            
                                How to resample timedeltas?
                            
                                How to control tor, when use tor proxy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

BeautifulSoup - Handing of cases where variable.find( ).string returns empty

Tags:

python

python-3.x

beautifulsoup

personalt

People also ask

1 Answers

aIKid

Recent Activity

Donate For Us