Python lxml/beautiful soup to find all links on a web page

Question

I am writing a script to read a web page, and build a database of links that matches a certain criteria. Right now I am stuck with lxml and understanding how to grab all the <a href>'s from the html...

result = self._openurl(self.mainurl)
content = result.read()
html = lxml.html.fromstring(content)
print lxml.html.find_rel_links(html,'href')

吳強福 · Accepted Answer

I want to provide an alternative lxml-based solution.

The solution uses the function provided in lxml.cssselect

    import urllib
    import lxml.html
    from lxml.cssselect import CSSSelector
    connection = urllib.urlopen('http://www.yourTargetURL/')
    dom =  lxml.html.fromstring(connection.read())
    selAnchor = CSSSelector('a')
    foundElements = selAnchor(dom)
    print [e.get('href') for e in foundElements]

Python lxml/beautiful soup to find all links on a web page

Tags:

python

lxml

Cmag

1 Answers

吳強福

Recent Activity

Donate For Us

Python lxml/beautiful soup to find all links on a web page

Tags:

python

lxml

Cmag

1 Answers

吳強福

Related questions

Recent Activity

Donate For Us