I'm using BeautifulSoup under Python for quite a bit of data scraping and cleaning and often append .text.strip()
to a soup.find command. Example: foo_stuff = soup.find("foo").text.strip()
In certain cases a soup.find
does not find anything, and the resulting .text.strip()
breaks. As I see it I can handle this a few ways:
.find
queries that always return something -- I am not a clever enough person to frame my queries like this in a clean fashion.
.text.strip()
-- Code is ugly.
.myfind
command that does something similar -- This involves me patching things and potentially throwing off collaborators.
Do other folks out there have better/clever solutions to solve this?
Edit: Right now I'm using a boring ol' function to try/except .text.strip()
:
def text_strip(soup_search):
if soup_search != None:
return soup_search.text.strip()
else:
return ""
How about writing a plain old function?
def find_stripped(soup, what):
found = soup.find(what)
if found is not None:
return found.text.strip()
# maybe:
# return ""
Now you can: foo_stuff = find_stripped(soup, "foo")
I think the safest way is to check whether .find()
returned a instance of type tag
.
from bs4.element import Tag
foo_stuff = soup.find("foo")
if isinstance(foo_stuff, Tag):
# do something with foo_stuff
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With