I am trying to parse below HTML using jsoup but not able to get the right syntax for it.
<div class="info"><strong>Line 1:</strong> some text 1<br>
<b>some text 2</b><br>
<strong>Line 3:</strong> some text 3<br>
</div>
I need to capture some text 1, some text 2 and some text 3 in three different variables.
I have the xpath for first line (which should be similar for line 3) but unable to work out the equivalent css selector.
//div[@class='info']/strong[1]/following::text()
On a separate I have few hundred html files and need to parse and extract data from them to store in a database. Is Jsoup best choice for this?
It really looks like Jsoup can't handle getting text out of an element with mixed content. Here is a solution that uses the XPath you formulated that uses XOM and TagSoup:
import java.io.IOException;
import nu.xom.Builder;
import nu.xom.Document;
import nu.xom.Nodes;
import nu.xom.ParsingException;
import nu.xom.ValidityException;
import nu.xom.XPathContext;
import org.ccil.cowan.tagsoup.Parser;
import org.xml.sax.SAXException;
public class HtmlTest {
public static void main(final String[] args) throws SAXException, ValidityException, ParsingException, IOException {
final String html = "<div class=\"info\"><strong>Line 1:</strong> some text 1<br><b>some text 2</b><br><strong>Line 3:</strong> some text 3<br></div>";
final Parser parser = new Parser();
final Builder builder = new Builder(parser);
final Document document = builder.build(html, null);
final nu.xom.Element root = document.getRootElement();
final Nodes textElements = root.query("//xhtml:div[@class='info']/xhtml:strong[1]/following::text()", new XPathContext("xhtml", root.getNamespaceURI()));
for (int textNumber = 0; textNumber < textElements.size(); ++textNumber) {
System.out.println(textElements.get(textNumber).toXML());
}
}
}
This outputs:
some text 1
some text 2
Line 3:
some text 3
Without knowing more specifics of what you're trying to do though, I'm not sure if this is exactly what you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With