Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JSOUP to get a un ordered list

Tags:

java

jsoup

I am re posting this message. I am trying to extract a unordered list. In the previous question I have the fomrat incorrect. This website from where I am trying to extract the data is formatted correctly.

<ul>
<li>
<i>
<a class="mw-redirect" title="title1" href="yahoo.com">used to be a best email</a>
</i>
(1999)
</li>
<li>
<i>
<a title="title2" href="google.com">Best search enginee We Will Go</a>
</i>
(1999)
</li>
<li>
<i>
<a title="title3" href="apple.com">Best Phone</a>
</i>
(1990)
</li>
</ul>

I want to print:

title1

google.com

yahoo.com

= used to be a best email Best search email will go Bestphone

similarly all Hrefs.

I did see the JSOUP documentation.

Related Question: jsoup to get the data in a unorderedlist but that is having format issues.

I tried as suggested but it is not working

I tried:

Document doc = Jsoup.connect(url).get();             
Element link = doc.select("a").last();
String title1 = link.attr("title");

Issue is this is a big page with some information. in that there are many unordered lists..

like image 938
The Learner Avatar asked Aug 18 '12 18:08

The Learner


1 Answers

Maybe my answer would be more accurate if you would format and specify your requirements better, is this what you were looking for ?

public static void main(String[] args) throws IOException
    {
        String html = "<ul><li><i><a class=\"mw-redirect\" title=\"title1\" href=\"yahoo.com\">used to be a best email</a></i>(1999)</li><li><i><a title=\"title2\" href=\"google.com\">Best search enginee We Will Go</a></i>(1999)</li><li><i><a title=\"title3\" href=\"apple.com\">Best Phone</a></i>(1990)</li></ul>";

        Document doc = Jsoup.parse(html);

        Elements links = doc.select("ul li i a");

        for (Element element : links) {
            System.out.format("%s %s %s\n", element.attr("title"), element.attr("href"), element.text());
        }
    }

If not add a sample output section in your question.

Update :

How it works. The ul li i a is a css selector. Which would mean take every a element that is located inside i that is wrapped in li tags which is wrapped in ul tags. (Horrible explanation)

You would get the same result from doc.select("a") as well. But being specific is better since you're parsing this data from some website because links can be in different places with different id/class or whatever and you are looking for these specific ones.

Yes if the selected elemets do have title, hyperlink and text value it will output that data.

like image 145
ant Avatar answered Nov 19 '22 13:11

ant