Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract href values inside td tags in jsoup

I have

<table class="table" >
<tr>   
 <td><a href="url">text1</a></td>    
<td>text2</td> 

 </tr>
  <tr>
    <td><a href="url2">text</a></td> 
   <td>text</td> 

</tr>

and I want to extract the url and text of all rows I use

Document doc = Jsoup.connect(url).get();
for (Element table : doc.select("table.table")) {
                for (Element row : table.select("tr")) {
                     Elements tds = row.select("td");
                           String text1=tds.get(0).text();
                           String url= row.attr("href");
                         System.out.println(text1+ "," + url);
                }
}

I get the text1 value but url is null.

How can I get the url from the td tags?

like image 503
Thal Avatar asked Jun 15 '12 06:06

Thal


1 Answers

Your row variable is not the a tag, so there is no attribute href on it.

Try with this:

Element table = doc.select("table.table");
Elements links = table.getElementsByTag("a");
for (Element link: links) {
    String url = link.attr("href");
    String text = link.text();
    System.out.println(text + ", " + url);
}

This is pretty much extracted from the JSoup documentation

like image 62
Alex Avatar answered Oct 13 '22 12:10

Alex