Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a Java utility to verify if a string is a valid HTML escape character?

I want a method in the following format:

public boolean isValidHtmlEscapeCode(String string);

Usage would be:

isValidHtmlEscapeCode("A") == false
isValidHtmlEscapeCode("ש") == true // Valid unicode character
isValidHtmlEscapeCode("ש") == true // same as 1513 but in HEX
isValidHtmlEscapeCode("�") == false // Invalid unicode character

I wasn't able to find anything that does that - is there any utility that does that? If not, is there any smart way to do it?

like image 545
RonK Avatar asked Dec 20 '12 15:12

RonK


People also ask

How do you escape HTML tags in Java?

In Java, we can use Apache commons-text , StringEscapeUtils. escapeHtml4(str) to escape HTML characters. In the old days, we usually use the Apache commons-lang3 , StringEscapeUtils class to escape HTML, but this class is deprecated as of 3.6.

How do you validate a character in Java?

To validate a string for alphabets you can either compare each character in the String with the characters in the English alphabet (both cases) or, use regular expressions.

How do you escape special characters in Java?

Strings - Special Characters The solution to avoid this problem, is to use the backslash escape character.


4 Answers

public static boolean isValidHtmlEscapeCode(String string) {
    if (string == null) {
        return false;
    }
    Pattern p = Pattern
            .compile("&(?:#x([0-9a-fA-F]+)|#([0-9]+)|([0-9A-Za-z]+));");
    Matcher m = p.matcher(string);

    if (m.find()) {
        int codePoint = -1;
        String entity = null;
        try {
            if ((entity = m.group(1)) != null) {
                if (entity.length() > 6) {
                    return false;
                }
                codePoint = Integer.parseInt(entity, 16);
            } else if ((entity = m.group(2)) != null) {
                if (entity.length() > 7) {
                    return false;
                }
                codePoint = Integer.parseInt(entity, 10);
            } else if ((entity = m.group(3)) != null) {
                return namedEntities.contains(entity);
            }
            return 0x00 <= codePoint && codePoint < 0xd800
                    || 0xdfff < codePoint && codePoint <= 0x10FFFF;
        } catch (NumberFormatException e) {
            return false;
        }
    } else {
        return false;
    }
}

Here's the set of named entities http://pastebin.com/XzzMYDjF

like image 184
Esailija Avatar answered Oct 16 '22 23:10

Esailija


you might want to have a look at Apache commons StringUtils: http://commons.apache.org/lang/api-2.3/org/apache/commons/lang/StringEscapeUtils.html#unescapeHtml(java.lang.String)

with the unescapeHtml you could do sth. like:

String input = "A";
String unescaped = StringEscapeUtils.unescapeHtml(input);
boolean containsValidEscape = !input.equals(a);
like image 22
Korgen Avatar answered Oct 16 '22 23:10

Korgen


Not sure if this is a perfect solution, but you can use Apache Commons Lang:

try {
    return StringEscapeUtils.unescapeHtml4(code).length() < code.length();
} catch (IllegalArgumentException e) {
    return false;
}
like image 41
hoaz Avatar answered Oct 17 '22 01:10

hoaz


This should be the method you wanted:

public static boolean isValidHtmlEscapeCode(String string) {
String temp = "";
try {
    temp = StringEscapeUtils.unescapeHtml3(string);
} catch (IllegalArgumentException e) {
    return false;
}
return !string.equals(temp);
}
like image 21
Aioros Avatar answered Oct 16 '22 23:10

Aioros