I am using Html.fromHtml(STRING).toString() to convert a string that may or may not have html and/or html entities in it, to a plain text string.
This is pretty slow, I think my last calculation was that it took about 22ms on avg. With a large batch of these it can add over a minute. So I am looking for a faster, performance built option.
Is there anyway to speed this up or are there other decoding options available?
Edit: Since there doesn't appear to be a built in method that is faster or built for performance specifically, I will reward the bounty to anyone that can point me in the direction of a library that:
Html.fromHtml(String).toString();
As a note, I already tried Jsoup with this method: Jsoup.parse(String).text()
and it was slower.
HtmlDecode(String) Converts a string that has been HTML-encoded for HTTP transmission into a decoded string. HtmlDecode(String, TextWriter) Converts a string that has been HTML-encoded into a decoded string, and sends the decoded string to a TextWriter output stream.
What about org.apache.commons.lang.StringEscapeUtils's unescapeHtml(). The library is available on Apache site.
(EDIT: June 2019 - See the comments below for updates about the library)
fromHtml()
does not have a high-performance HTML parser, and I have no idea how quick the toString()
implementation on SpannedString
is. I doubt either were designed for your scenario.
Ideally, the strings are clean before they get to a low-power phone. Either clean them up in the build process (for resources/assets), or clean them up on a server (before you download them).
If, for whatever reason, you absolutely need to clean them up on the device, you can perhaps use the NDK to create a C/C++ library that does the cleaning for you faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With