Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java how to encode single quote and double quote into HTML entities?

Tags:

java

html

How can I encode " into " and ' into ' ?

I am quite suprised single quote and double quote is not defined in HTML Entities 4.0, and so StringEscapeUtils not able to escape these 2 characters into respective entities.

Is there any other String related tool able to do this?

Any reason why single quote and double quote is not defined in HTML Entities 4.0?

Besides single quote and double quote, is there any framework able to encode all the unicode character into respective entities? Since all the unicode can be manually translate into decimal entities and show in HTML, so wonder is there any tool able to convert it automatically?

like image 553
Sam YC Avatar asked Dec 20 '22 03:12

Sam YC


1 Answers

  1. Single quote and double quote not defined in HTML 4.0

Single quote only is not defined in HTML 4.0, double quote is defined as " starting HTML2.0

  1. StringEscapeUtils not able to escape these 2 characters into respective entities

escapeXml11 in StringEscapeUtils supports converting single quote into '.

For Example:

StringEscapeUtils.escapeXml11("'"); //Returns '
StringEscapeUtils.escapeHtml4("\""); //Returns "
  1. Is there any other String related tool able to do this?

HTMLUtils from Spring framework takes care of single quotes & double quotes, it also converts the values to decimal (like ' & "). Following example is taken from the answer to this question:

import org.springframework.web.util.HtmlUtils;
[...]
HtmlUtils.htmlEscapeDecimal("&")` //gives &
HtmlUtils.htmlEscape("&")` //gives &
  1. Any reason why single quote and double quote is not defined in HTML Entities 4.0?

As per Character entity references in HTML 4 the single quote is not defined. Double quote is available from HTML2.0. Whereas single quote is supported as part of XHTML1.0.

  1. Tool or method to encode all the unicode character into respective entities

There is a very good & simple java implementation mentioned as part of an answer to this question.

Following is a sample program based on that answer:

import org.apache.commons.lang3.StringEscapeUtils;

public class HTMLCharacterEscaper {
    public static void main(String[] args) {        
        //With StringEscapeUtils
        System.out.println("Using SEU: " + StringEscapeUtils.escapeHtml4("\" ¶"));
        System.out.println("Using SEU: " + StringEscapeUtils.escapeXml11("'"));

        //Single quote & double quote
        System.out.println(escapeHTML("It's good"));
        System.out.println(escapeHTML("\" Grit \""));

        //Unicode characters
        System.out.println(escapeHTML("This is copyright symbol ©"));
        System.out.println(escapeHTML("Paragraph symbol ¶"));
        System.out.println(escapeHTML("This is pound £"));      
    }

    public static String escapeHTML(String s) {
        StringBuilder out = new StringBuilder(Math.max(16, s.length()));
        for (int i = 0; i < s.length(); i++) {
            char c = s.charAt(i);
            if (c > 127 || c == '"' || c == '<' || c == '>' || c == '&' || c == '\'') {
                out.append("&#");
                out.append((int) c);
                out.append(';');
            } else {
                out.append(c);
            }
        }
        return out.toString();
    }

}

Following are some interesting links, which i came across during the pursuit of the answer:

  • Common HTML entities used for typography
  • Why shouldn't &apos; be used to escape single quotes?
  • The Named Character Reference &apos;
  • HTML apostrophe
like image 155
bprasanna Avatar answered Feb 23 '23 01:02

bprasanna