I can't seem to load in a local html file, using the Jsoup library. Or at the very least it doesn't seem to be recognising it. I hardcoded the exact html in the local file (as the var 'html') and when I switch to that instead of a file input the code works perfectly. But the file is read on both occasions.
import java.io.File;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class FileHtmlParser{
public String input;
//constructor
public FileHtmlParser(String inputFile){input = inputFile;}
//methods
public FileHtmlParser execute(){
File file = new File(input);
System.out.println("The file can be read: " + file.canRead());
String html = "<html><head><title>First parse</title><meta>106</meta> <meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" /></head>"
+ "<body><p>Parsed HTML into a doc.</p>" +
"" +
"<div id=\"navbar\">this is the div</div></body></html>";
Document doc = Jsoup.parseBodyFragment(input);
Elements content = doc.getElementsByTag("div");
if(content.hasText()){System.out.println("result is " + content.outerHtml());}
else System.out.println("nothing!");
return this;
}
}/*endOfClass*/
Result when:
Document doc = Jsoup.parseBodyFragment(html)
The file can be read: true
result is <div id="navbar">
this is the div
</div>
Result when:
Document doc = Jsoup.parseBodyFragment(input)
The file can be read: true
nothing!
Your mistake is in assuming that Jsoup.parseBodyFragment()
knows whether you're passing it a filename that contains html markup or a string that contains the html markup.
Jsoup.parseBodyFragment(input)
expects that input
is a String
that contains html markup, not a filename.
To ask it to parse from a file use the Jsoup.parse(File in, String charsetName)
method instead:
File in = new File(input);
Document doc = Jsoup.parse(in, null);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With