Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to obtain href values from a div using xpath?

Tags:

html

css

xpath

I have a div like this:

    <div class="widget-archive-monthly widget-archive widget"> 
    <h3 class="widget-header">Monthly <a href="http://myblog.com/blogs/my_name/archives.html">Archives</a></h3> 
    <div class="widget-content"> 
        <ul> 
            <li><a href="http://myblog.com/blogs/my_name/2010/10/">October 2010</a></li> 
            <li><a href="http://myblog.com/blogs/my_name/2010/09/">September 2010</a></li> 
            <li><a href="http://myblog.com/blogs/my_name/2010/08/">August 2010</a></li> 
            <li><a href="http://myblog.com/blogs/my_name/2010/07/">July 2010</a></li> 
            <li><a href="http://myblog.com/blogs/my_name/2010/06/">June 2010</a></li> 
            <li><a href="http://myblog.com/blogs/my_name/2010/05/">May 2010</a></li> 
            <li><a href="http://myblog.com/blogs/my_name/2010/04/">April 2010</a></li> 
            <li><a href="http://myblog.com/blogs/my_name/2010/03/">March 2010</a></li> 
            <li><a href="http://myblog.com/blogs/my_name/2010/02/">February 2010</a></li> 
            <li><a href="http://myblog.com/blogs/my_name/2010/01/">January 2010</a></li> 
            <li><a href="http://myblog.com/blogs/my_name/2009/12/">December 2009</a></li> 
            <li><a href="http://myblog.com/blogs/my_name/2009/11/">November 2009</a></li> 
            <li><a href="http://myblog.com/blogs/my_name/2009/10/">October 2009</a></li> 
            <li><a href="http://myblog.com/blogs/my_name/2009/09/">September 2009</a></li> 
            <li><a href="http://myblog.com/blogs/my_name/2009/08/">August 2009</a></li> 
            <li><a href="http://myblog.com/blogs/my_name/2009/07/">July 2009</a></li> 
            <li><a href="http://myblog.com/blogs/my_name/2009/06/">June 2009</a></li> 
            <li><a href="http://myblog.com/blogs/my_name/2009/05/">May 2009</a></li> 
            <li><a href="http://myblog.com/blogs/my_name/2009/04/">April 2009</a></li> 
            <li><a href="http://myblog.com/blogs/my_name/2009/03/">March 2009</a></li> 
            <li><a href="http://myblog.com/blogs/my_name/2009/02/">February 2009</a></li> 
        </ul> 
    </div> 
</div> 

I'm trying to get the href values inside the widget-content div.

How would I target these links using xpath and ignore any other link on the page such as the one for "Archives" so that I end up just with these values:

        http://myblog.com/blogs/my_name/2010/10/
        http://myblog.com/blogs/my_name/2010/09/
        http://myblog.com/blogs/my_name/2010/08/
        http://myblog.com/blogs/my_name/2010/07/
        ... etc ...
like image 641
August Avatar asked Oct 31 '10 17:10

August


People also ask

How do I find the XPath of a link?

Launch the Chrome browser and navigate to the URL or webpage. Hover the mouse over the desired element (object) on the web page, right-click on the element you are looking for XPath, and select “Inspect.”

How do I find the XPath of a tag?

The intent is to locate the fields using XPath. Go to the First name tab and right click >> Inspect. On inspecting the web element, it will show an input tag and attributes like class and id. Use the id and these attributes to construct XPath which, in turn, will locate the first name field.

How do you make a href in Scrapy?

We are using response. css() to select all the elements with the class title and the tag a. Then we are using the ::attr(href) to select the href attribute of all the elements we have selected. Then we are using the getall() to get all the values of the href attribute.


1 Answers

//div[@class="widget-content"]//a/@href

That should give you the href values of links that are ONLY inside the widget-content DIV.

like image 73
Neeme Praks Avatar answered Sep 24 '22 18:09

Neeme Praks