How to generate XPath query matching a specific element in Jsoup?

Tags:

_ Hi , this is my web page :

<html>
    <head>
    </head>
    <body>
        <div> text div 1</div>
        <div>
            <span>text of first span </span>
            <span>text of second span </span>
        </div>
        <div> text div 3 </div>
    </body>
</html>

I'm using jsoup to parse it , and then browse all elements inside the page and get their paths :

 Document doc = Jsoup.parse(new File("C:\\Users\\HC\\Desktop\\dataset\\index.html"), "UTF-8");
 Elements elements = doc.body().select("*");
ArrayList all = new ArrayList();
        for (Element element : elements) {
            if (!element.ownText().isEmpty()) {

                StringBuilder path = new StringBuilder(element.nodeName());
                String value = element.ownText();
                Elements p_el = element.parents();

                for (Element el : p_el) {
                    path.insert(0, el.nodeName() + '/');
                }
                all.add(path + " = " + value + "\n");
                System.out.println(path +" = "+ value);
            }
        }

        return all;

my code give me this result :

html/body/div = text div 1
html/body/div/span = text of first span
html/body/div/span = text of second span
html/body/div = text div 3

in fact i want get result like this :

html/body/div[1] = text div 1
html/body/div[2]/span[1] = text of first span
html/body/div[2]/span[2] = text of second span
html/body/div[3] = text div 3

please could any one give me idea how to get reach this result :) . thanks in advance.

286

asked Mar 26 '16 09:03

kivok94

1 Answers

As asked here a idea. Even if I'm quite sure that there better solutions to get the xpath for a given node. For example use xslt as in the answer to "Generate/get xpath from XML node java".

Here the possible solution based on your current attempt.

For each (parent) element check if there are more than one element with this name. Pseudo code: if ( count (el.select('../' + el.nodeName() ) > 1)
If true count the preceding-sibling:: with same name and add 1.
count (el.select('preceding-sibling::' + el.nodeName() ) +1

173

answered Nov 01 '22 11:11

hr_117

Related questions
                            
                                Why does Console.WriteLine() use ecx register, event though eax and ebx are free? [duplicate]
                            
                                NullPointerException in Cordova App reported by Fabric SDK
                            
                                How to access the redux store from a library?
                            
                                c++ std::forward on the container calling operator[]
                            
                                Automating wildcard subdomain for users GAE
                            
                                Is there a way to communicate between multiple electron main processes?
                            
                                How can I overwrite an object in python? [duplicate]
                            
                                Mixin to wrap every method of a Scala trait
                            
                                Single-stat percentage change from initial value in graphite/grafana?
                            
                                RxJS equivalent of Async.js mapLimit
                            
                                How to get every valid hex code in javascript
                            
                                Can gRPC server run on top of another HTTP/2 web server like jetty/undertow/tomcat?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With