Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

jsoup remove div with a certain class

Tags:

java

jsoup

I have a list in jsoup like this:

Elements tbody = new Elements();

tbody might look like this (---- separates elements in tbody list):

<td> 
 <div data-emission="56b2140adb6da7bf3cbf6228" class="mainCell"> 
  <a href="/tv/weather-country-12457/"> <span class="left">16:00</span> 
   <div> 
    <p>Weather - country</p> 
   </div> </a> 
 </div> 
 <div data-emission="56b2140adb6da7bf3cbf6237" class="mainCell shows pending"> 
  <a href="/shows/that's-70-show-550347/epi1201/"> <span class="left">16:10</span> 
   <div> 
    <p>That's 70 show</p> 
    <span class="info">epi. 1201, Show</span> 
   </div> <p class="onAir"> <span>Pending</span> <u></u> <u style="width: 5%"></u> </p> </a> 
 </div> </td>
 ---------------------------------------------------------------------------
 <td> 
 <div data-emission="56b23876db6da7bf3cbf6588" class="mainCell pending"> 
  <a href="/tv/weather-563806/"> <span class="left">16:10</span> 
   <div> 
    <p>Weather</p> 
   </div> <p class="onAir"> <span>Pending</span> <u></u> <u style="width: 51%"></u> </p> </a> 
 </div> 
 <div data-emission="56b23876db6da7bf3cbf6589" class="mainCell"> 
  <a href="/tv/animal-cops-2615/"> <span class="left">16:15</span> 
   <div> 
    <p>Animal Cops</p> 
    <span class="info">epi. 3079, Show</span> 
   </div> </a> 
 </div> 
 <div data-emission="56b23876db6da7bf3cbf658a" class="mainCell shows"> 
  <a href="/show/house-md-1601/odc137/"> <span class="left">16:30</span> 
   <div> 
    <p>House MD</p> 
    <span class="info">epi. 137, Show</span> 
   </div> </a> 
 </div> </td>
 ---------------------------------------------------------------------------
 <td> 
 <div data-emission="56b213b3db6da7bf3cbf61a1" class="mainCell movies pending"> 
  <a href="/movie/star-trek-564170/"> <span class="left">16:00</span> 
   <div> 
    <p>Star Trek</p> 
    <span class="info">Movie</span> 
    <span class="szh prem">| Premiere</span> 
   </div> <p class="onAir"> <span>Pending</span> <u></u> <u style="width: 21%"></u> </p> </a> 
 </div> </td>

My goal is to remove every movie/show that is pending/onAir. So in this example i would like to get rid of a whole div that has:

  • that's 70 show
  • weather
  • star trek

f.e:

for(int i = 0; i < tbody.size(); i++){
            tbody.get(i).select("div").select("p").select(".onAir").remove();
        }

It removes only an element itself, not a whole div. I have tried in many ways but unsuccessfully. I will appreciate any help.

like image 537
user3529850 Avatar asked Feb 23 '16 17:02

user3529850


2 Answers

It seems that the pending shows also carry the pending css class. If this is true for all cases you can do it very simply by:

doc.select("td>div.pending").remove();

This will remove all div elements with the pending class from the document doc. if they are direct children of a td element.

Alternatively, you can use your approach and filter for the p element with the correct onAir class and inner text:

doc.select("td>div:has(p.onAir:contains(Pending))").remove();

See the CSS selector syntax to understand the power of Jsoup.

like image 121
luksch Avatar answered Sep 18 '22 13:09

luksch


Try following code snippet.

Elements mainCells = tbody.select("div.mainCell");
for(int i = 0; i < mainCells.size(); i++){
    Elements mainCellsP = mainCells.get(i).select("div").select("a").select("p");
    if (mainCellsP.size() == 2) {
        // Remove this node from DOM tree
        mainCells.get(i).remove();
    }
}

First select the appropriate node you want to delete and then call remove() method of that node.

like image 30
Rupak Avatar answered Sep 22 '22 13:09

Rupak