I would like to parse an HTML file with Python, and the module I am using is BeautifulSoup.
It is said that the function find_all
is the same as findAll
. I've tried both of them, but I believe they are different:
import urllib, urllib2, cookielib from BeautifulSoup import * site = "http://share.dmhy.org/topics/list?keyword=TARI+TARI+team_id%3A407" rqstr = urllib2.Request(site) rq = urllib2.urlopen(rqstr) fchData = rq.read() soup = BeautifulSoup(fchData) t = soup.findAll('tr')
Can anyone tell me the difference?
find is used for returning the result when the searched element is found on the page. find_all is used for returning all the matches after scanning the entire document. It is used for getting merely the first tag of the incoming HTML object for which condition is satisfied.
find_all returns an object of ResultSet which offers index based access to the result of found occurrences and can be printed using a for loop. Unwanted values These are not desired most of the time. So, attributes like id , class , or value are used to further refine the search.
findAll("p", {"class": "pagination-container and something"}) , BeautifulSoup would match an element having the exact class attribute value. There is no splitting involved in this case - it just sees that there is an element where the complete class value equals the desired string.
In BeautifulSoup version 4, the methods are exactly the same; the mixed-case versions (findAll
, findAllNext
, nextSibling
, etc.) have all been renamed to conform to the Python style guide, but the old names are still available to make porting easier. See Method Names for a full list.
In new code, you should use the lowercase versions, so find_all
, etc.
In your example however, you are using BeautifulSoup version 3 (discontinued since March 2012, don't use it if you can help it), where only findAll()
is available. Unknown attribute names (such as .find_all
, which only is available in BeautifulSoup 4) are treated as if you are searching for a tag by that name. There is no <find_all>
tag in your document, so None
is returned for that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With