I am writing a JAVA program to extract HTML data for a project. This is the HTML code
<td align="left" valign="top" class="style3">
PC / Van<br>$14 (Mon-Fri, excl PH)
<br>
$18 (Sat, Sun & PH)<br><br>$70/Day(Mon-Fri, excl PH: Entry - 24:00)
<br>
$100/day (Sat, Sun & PH: Entry - 24:00)
</td></tr>
The following is my JAVA code for extraction.
String connect1 = url1.toString();
Document doc1 = Jsoup.connect(connect1).get();
// get all links
Elements type1 = doc1.select("[class=\"style3\"]");
int size = type1.size();
try {
String text =type1.first.text();
System.out.println(text);
} catch (Exception e) {
e.printStackTrace();
}
The output I get is
PC / Van$14 (Mon-Fri, excl PH)$18 (Sat, Sun & PH)$70/Day(Mon-Fri, excl PH: Entry - 24:00)$100/day (Sat, Sun & PH: Entry - 24:00)
How can I split them from the < br > tags?
Description. The parse(String html) method parses the input HTML into a new Document. This document object can be used to traverse and get details of the html dom.
Jsoup parses the source code as delivered from the server (or in this case loaded from file). It does not invoke client-side actions such as JavaScript or CSS DOM manipulation.
you can replace all <br>
labels to \n
symbol,the code example is shown below:
Document doc1 = Jsoup.parse(s);
Elements type1 = doc1.select("[class=\"style3\"]");
try {
String text =type1.first().html();
text = text.replaceAll("<br>", "\n");
System.out.println(text);
} catch (Exception e) {
e.printStackTrace();
}
or split the text to string array with <br>
label
Document doc1 = Jsoup.parse(s);
Elements type1 = doc1.select("[class=\"style3\"]");
try {
String text =type1.first().html();
String[] textSplitResult = text.split("<br>");
if (null != textSplitResult) {
for (String t : textSplitResult) {
System.out.println(t);
}
}
} catch (Exception e) {
e.printStackTrace();
}
or use java8 lambda to output result
String text =type1.first().html();
String[] textSplitResult = text.split("<br>");
if (null != textSplitResult) {
Arrays.stream(textSplitResult).peek((x) -> System.out.println(x)).count();
//or Arrays.stream(textSplitResult).peek(System.out::println).count();
}
The executing result:
PC / Van
$14 (Mon-Fri, excl PH)
$18 (Sat, Sun & PH)
$70/Day(Mon-Fri, excl PH: Entry - 24:00)
$100/day (Sat, Sun & PH: Entry - 24:00)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With