I am trying to select the infobox on Wikipedia's Google entry page: http://en.m.wikipedia.org/wiki/Google
So, I call:
contentDiv = document.select("div[id=content]").first();
Which works as expected, then I do:
Elements infoboxes = contentDiv.select("table[class=infobox]");
Then I check infoboxes.isEmpty()
and I am stunned to discover that it is empty!
I checked and verified that the element contentDiv
contains the following:
<table class="infobox vcard" style="width: 22em;" cellspacing="5">
So, why does contentDiv.select("table[class=infobox]")
return empty???
UPDATE: I tested the above with contentDiv.select("table[class=infobox vcard]")
and it works fine! This is weird since I know that unlike the table.infobox.vcard
notation which only selects the exact multiclass element, table[class=infobox]
should select all tables that have at least infobox
in their listed classes.
BTW, I tested the code, with a different Wikipedia entry, containing:
<table class="infobox biota" style="text-align: left; width: 200px; font-size: 100%;">
And that contentDiv.select("table[class=infobox]")
behaves exactly as expected, returning that table element as the first item in infoboxes
.
Any idea why the inconsistency? What could explain this odd behavior?
Is it possible that I just stumbled on a Jsoup bug?
(I'm using jsoup-1.5.2, not the latest but I don't need HTML5 support and for various reasons I can't upgrade immediately to the latest 1.6.1).
The [attributename=attributevalue]
selector is an exact match. This is specified in CSS selector spec (emphasis mine):
[att=val]
Match when the element's "att" attribute value is exactly "val".
You want to use the [attributename~=attributevalue]
instead:
Elements infoboxes = contentDiv.select("table[class~=infobox]");
// ...
or, better actually, the .classname
selector:
Elements infoboxes = contentDiv.select("table.infobox");
// ...
Selector
APIAs to your test with different Wikipedia entry, I can't reproduce this. But I can tell that this page contains another <table class="infobox">
which must be the one you're actually retrieving.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With