Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python beautiful soup arguments

I have this code that fetches some text from a page using BeautifulSoup

soup= BeautifulSoup(html)
body = soup.find('div' , {'id':'body'})
print body

I would like to make this as a reusable function that takes in some htmltext and the tags to match it like the following

def parse(html, atrs):
 soup= BeautifulSoup(html)
 body = soup.find(atrs)
 return body

But if i make a call like this

    parse(htmlpage, ('div' , {'id':'body'}"))  or like

parse(htmlpage, ['div' , {'id':'body'}"])

I get only the div element, the body attribute seems to get ignored.

Is there a way to fix this?

like image 627
scott Avatar asked Apr 03 '10 12:04

scott


2 Answers

def parse(html, *atrs):
 soup= BeautifulSoup(html)
 body = soup.find(*atrs)
 return body

And then:

parse(htmlpage, 'div', {'id':'body'})
like image 177
Eli Bendersky Avatar answered Nov 15 '22 21:11

Eli Bendersky


I think you just need to add an asterisk here:

body = soup.find(*atrs)

Without the asterisk you are passing a single parameter which is a tuple:

body = soup.find(('div' , {'id':'body'}))

With the asterisk the tuple is expanded out and the statement becomes equivalent to what you want:

body = soup.find('div' , {'id':'body'})

See this article for more information on using the *args notation, and the related **kwargs.

like image 45
Mark Byers Avatar answered Nov 15 '22 22:11

Mark Byers