Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JSoup: Retrieve element that does not contain a specific attribute

Tags:

java

jsoup

I have a table that contain follow logic.

  1. The table display list of names
  2. For every row that contain <tr class=hiderow><td class=packagename>...</td></tr> -> this row will not be visible.

So the table might contain 100 rows, but if there are 20 rows contain class=hiderow, then the user can only see 80 rows on the page. I want to retrieve the name of those 80 rows (not 100). So I need to parse out data that does not contain class=hiderow. I know how to obtain every name using jsoup, I also see there is in the documentation :not(selector) elements that do not match the selector. but i am not sure how to use it. Please help.

EDIT I have figure out how to do it. Please let me know if there is better way.
EDIT2 Please use solution below from BalusC. It's much cleaner.

public void obtainPackageName(String urlLink) throws IOException{
    List<String> pdfList = new ArrayList<String>();
    URL url = new URL(urlLink);
    Document doc = Jsoup.parse(url, 3000);
    Element table = doc.select("table[id=mastertableid]").first();
    Iterator<Element> rowIter = table.select("tr").iterator();
    while(rowIter.hasNext()){
        Element row = rowIter.next();
        if(!row.className().contains("hiderow")){
            Element packageName = row.select("td[class=packagename]").first();
            if(packageName != null){
                pdfList.add(packageName.text());
            }

        }
    }
}
like image 480
Thang Pham Avatar asked May 22 '26 19:05

Thang Pham


1 Answers

You need to apply the :not() on the element of interest (which is tr in your case) and then pass the element-relative CSS selector into it on which the element should not match (which is .hiderow in your case).

So, this should do:

Document document = Jsoup.connect(urlLink).get();
Elements packagenames = document.select("#mastertableid tr:not(.hiderow) td.packagename");
List<String> pdfList = new ArrayList<String>();

for (Element packagename : packagenames) {
    pdfList.add(packagename.text()); 
}
like image 143
BalusC Avatar answered May 24 '26 08:05

BalusC



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!