Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does join work in python beautifulsoup

I am learning python and beautifulsoup, and saw this code online:

from BeautifulSoup import BeautifulSoup, SoupStrainer
import re

html = ['<html><body><p align="center"><b><font size="2">Table 1</font></b><table><tr><td>1. row 1, cell 1</td><td>1. row 1, cell 2</td></tr><tr><td>1. row 2, cell 1</td><td>1. row 2, cell 2</td></tr></table><p align="center"><b><font size="2">Table 2</font></b><table><tr><td>2. row 1, cell 1</td><td>2. row 1, cell 2</td></tr><tr><td>2. row 2, cell 1</td><td>2. row 2, cell 2</td></tr></table></html>']
soup = BeautifulSoup(''.join(html))
searchtext = re.compile(r'Table\s+1',re.IGNORECASE)
foundtext = soup.find('p',text=searchtext) # Find the first <p> tag with the search text
table = foundtext.findNext('table') # Find the first <table> tag that follows it
rows = table.findAll('tr')
for tr in rows:
    cols = tr.findAll('td')
    for td in cols:
        try:
            text = ''.join(td.find(text=True))
        except Exception:
            text = ""
        print text+"|",
    print

while everything else is clear, i could not understand how the join is working.

    text = ''.join(td.find(text=True))

I tried searching the BS documentation for join, but i couldn't find anything and couldn't really find help online as well on how join is used in BS.

Please let me know how that line works. thanks!

PS: the above code is from another stackoverflow page, its not my homework :) How can I find a table after a text string using BeautifulSoup in Python?

like image 442
user1644208 Avatar asked Dec 04 '25 12:12

user1644208


1 Answers

''.join() is a python function, not anything BS specific. It let's you join a sequence with the string as a joining value:

>>> '-'.join(map(str, range(3)))
'0-1-2'
>>> ' and '.join(('bangers', 'mash'))
'bangers and mash'

'' is simply the empty string, and makes joining a whole set of strings together into one large one easier:

>>> ''.join(('5', '4', 'apple', 'pie'))
'54applepie'

In the specific case of your example, the statement finds all text contained in the <td> element, including any contained HTML elements such as <b> or <i> or <a href=""> and puts them all together into one long string. So td.find(text=True) finds a sequence of python strings, and ''.join() then joins those together into one long string.

like image 96
Martijn Pieters Avatar answered Dec 06 '25 02:12

Martijn Pieters



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!