I have this code that fetches some text from a page using BeautifulSoup
soup= BeautifulSoup(html)
body = soup.find('div' , {'id':'body'})
print body
I would like to make this as a reusable function that takes in some htmltext and the tags to match it like the following
def parse(html, atrs):
soup= BeautifulSoup(html)
body = soup.find(atrs)
return body
But if i make a call like this
parse(htmlpage, ('div' , {'id':'body'}")) or like
parse(htmlpage, ['div' , {'id':'body'}"])
I get only the div element, the body attribute seems to get ignored.
Is there a way to fix this?
def parse(html, *atrs):
soup= BeautifulSoup(html)
body = soup.find(*atrs)
return body
And then:
parse(htmlpage, 'div', {'id':'body'})
I think you just need to add an asterisk here:
body = soup.find(*atrs)
Without the asterisk you are passing a single parameter which is a tuple:
body = soup.find(('div' , {'id':'body'}))
With the asterisk the tuple is expanded out and the statement becomes equivalent to what you want:
body = soup.find('div' , {'id':'body'})
See this article for more information on using the *args
notation, and the related **kwargs
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With