Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elegant way to safely .text.strip() in BeautifulSoup?

I'm using BeautifulSoup under Python for quite a bit of data scraping and cleaning and often append .text.strip() to a soup.find command. Example: foo_stuff = soup.find("foo").text.strip()

In certain cases a soup.find does not find anything, and the resulting .text.strip() breaks. As I see it I can handle this a few ways:

  • Write .find queries that always return something -- I am not a clever enough person to frame my queries like this in a clean fashion.
  • use try/except statements on every .text.strip() -- Code is ugly.
  • I could patch the .find command to have a try/except, or include a .myfind command that does something similar -- This involves me patching things and potentially throwing off collaborators.

Do other folks out there have better/clever solutions to solve this?

Edit: Right now I'm using a boring ol' function to try/except .text.strip():

def text_strip(soup_search):
    if soup_search != None:
        return soup_search.text.strip()
    else:
        return ""
like image 920
binarysolo Avatar asked Nov 30 '12 01:11

binarysolo


2 Answers

How about writing a plain old function?

def find_stripped(soup, what):
  found = soup.find(what)
  if found is not None:
    return found.text.strip()
  # maybe:
  # return ""

Now you can: foo_stuff = find_stripped(soup, "foo")

like image 193
9000 Avatar answered Sep 29 '22 05:09

9000


I think the safest way is to check whether .find() returned a instance of type tag.

from bs4.element import Tag
foo_stuff = soup.find("foo") 

if isinstance(foo_stuff, Tag):  
  # do something with foo_stuff
like image 37
Harman Avatar answered Sep 29 '22 05:09

Harman