Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the difference between Android's Html.escapeHtml and TextUtils.htmlEncode ? When should I use one or the other?

Android has two different ways to escape / encode HTML characters / entities in Strings:

  • Html.escapeHtml(String), added in API 16 (Android 4.1). The docs say:

    Returns an HTML escaped representation of the given plain text.

  • TextUtils.htmlEncode(String) For this one, the docs say:

    Html-encode the string.

Reading the docs, they both seem to do pretty much the same thing, but, when testing them, I get some pretty mysterious (to me) output.

Eg. With the input: <p>This is a quote ". This is a euro symbol: €. <b>This is some bold text</b></p>

  • Html.escapeHtml gives:

    &lt;p&gt;This is a quote ". This is a euro symbol: &#8364;. &lt;b&gt;This is some bold text&lt;/b&gt;&lt;/p&gt;
    
  • Whereas TextUtils.htmlEncode gives:

    &lt;p&gt;This is a quote &quot;. This is a euro symbol: €. &lt;b&gt;This is some bold text&lt;/b&gt;&lt;/p&gt;
    

So it seems that the second escapes / encodes the quote ("), but the first doesn't, although the first encodes the Euro symbol, but the second doesn't. I'm confused.


So what's the difference between these two methods ? Which characters does each escape / encode ? What's the difference between encoding and escaping here ? When should I use one or the other (or should I, gasp, use them both together ?) ?

like image 942
JonasCz Avatar asked Jan 30 '16 16:01

JonasCz


People also ask

What does escapeHtml do?

The escapeHtml function is designed to accept a string input of text and return an escaped value to interpolate into HTML.

How do I use escapeHtml in Java?

In Java, we can use Apache commons-text , StringEscapeUtils. escapeHtml4(str) to escape HTML characters. In the old days, we usually use the Apache commons-lang3 , StringEscapeUtils class to escape HTML, but this class is deprecated as of 3.6.

Why should you escape HTML?

EDIT - The reason for escaping is that special characters like & and < can end up causing the browser to display something other than what you intended. A bare & is technically an error in the html. Most browsers try to deal intelligently with such errors and will display them correctly in most cases.


1 Answers

You can compare their sources:

This is what Html.escapeHtml uses underneath:

https://github.com/android/platform_frameworks_base/blob/d59921149bb5948ffbcb9a9e832e9ac1538e05a0/core/java/android/text/Html.java#L387

This is TextUtils.htmlEncode:

https://github.com/android/platform_frameworks_base/blob/d59921149bb5948ffbcb9a9e832e9ac1538e05a0/core/java/android/text/TextUtils.java#L1361

As you can see, the latter only quotes certain characters that are reserved for markup in HTML, while the former also encodes non-ASCII characters, so they can be represented in ASCII.

Thus, if your input only contains Latin characters (which is usually unlikely nowadays), or you have set up Unicode in your HTML page properly, and can go along with TextUtils.htmlEncode. Whereas if you need to ensure that your text works even if transmitted via 7-bit channels, use Html.escapeHtml.

As for the different treating of the quote character (") -- it only needs to be escaped inside attribute values (see the spec), so if you are not putting your text there, you should be fine.

Thus, my personal choice would be Html.escapeHtml, as it seems to be more versatile.

like image 155
Mikhail Naganov Avatar answered Oct 24 '22 02:10

Mikhail Naganov