beautifulsoup 4 + python: string returns 'None'

Question

I'm trying to parse some html with BeautifulSoup4 and Python 2.7.6, but the string is returning "None". The HTML i'm trying to parse is:

<div class="booker-booking">
    2&nbsp;rooms
    &#0183;
    USD&nbsp;0
    <!-- Commission: USD  -->
</div>

The snippet from python I have is:

 data = soup.find('div', class_='booker-booking').string

I've also tried the following two:

data = soup.find('div', class_='booker-booking').text
data = soup.find('div', class_='booker-booking').contents[0]

Which both return:

u'
		2\xa0rooms 
		\xb7
		USD\xa00

I'm ultimately trying to get the first line into a variable just saying "2 Rooms", and the third line into another variable just saying "USD 0".

jfs · Accepted Answer

.string returns None because the text node is not the only child (there is a comment).

from bs4 import BeautifulSoup, Comment

soup = BeautifulSoup(html)
div = soup.find('div', 'booker-booking')
# remove comments
text = " ".join(div.find_all(text=lambda t: not isinstance(t, Comment)))
# -> u'
    2\xa0rooms
    \xb7
    USD\xa00
     
'

To remove Unicode whitespace:

text = " ".join(text.split())
# -> u'2 rooms \xb7 USD 0'
print text
# -> 2 rooms · USD 0

To get your final variables:

var1, var2 = [s.strip() for s in text.split(u"\xb7")]
# -> u'2 rooms', u'USD 0'

beautifulsoup 4 + python: string returns 'None'

Tags:

python

parsing

html-parsing

beautifulsoup

crookedleaf

1 Answers

jfs

Recent Activity

Donate For Us

beautifulsoup 4 + python: string returns 'None'

Tags:

python

parsing

html-parsing

beautifulsoup

crookedleaf

1 Answers

jfs

Related questions

Recent Activity

Donate For Us