I want a method in the following format:
public boolean isValidHtmlEscapeCode(String string);
Usage would be:
isValidHtmlEscapeCode("A") == false
isValidHtmlEscapeCode("ש") == true // Valid unicode character
isValidHtmlEscapeCode("ש") == true // same as 1513 but in HEX
isValidHtmlEscapeCode("�") == false // Invalid unicode character
I wasn't able to find anything that does that - is there any utility that does that? If not, is there any smart way to do it?
In Java, we can use Apache commons-text , StringEscapeUtils. escapeHtml4(str) to escape HTML characters. In the old days, we usually use the Apache commons-lang3 , StringEscapeUtils class to escape HTML, but this class is deprecated as of 3.6.
To validate a string for alphabets you can either compare each character in the String with the characters in the English alphabet (both cases) or, use regular expressions.
Strings - Special Characters The solution to avoid this problem, is to use the backslash escape character.
public static boolean isValidHtmlEscapeCode(String string) {
if (string == null) {
return false;
}
Pattern p = Pattern
.compile("&(?:#x([0-9a-fA-F]+)|#([0-9]+)|([0-9A-Za-z]+));");
Matcher m = p.matcher(string);
if (m.find()) {
int codePoint = -1;
String entity = null;
try {
if ((entity = m.group(1)) != null) {
if (entity.length() > 6) {
return false;
}
codePoint = Integer.parseInt(entity, 16);
} else if ((entity = m.group(2)) != null) {
if (entity.length() > 7) {
return false;
}
codePoint = Integer.parseInt(entity, 10);
} else if ((entity = m.group(3)) != null) {
return namedEntities.contains(entity);
}
return 0x00 <= codePoint && codePoint < 0xd800
|| 0xdfff < codePoint && codePoint <= 0x10FFFF;
} catch (NumberFormatException e) {
return false;
}
} else {
return false;
}
}
Here's the set of named entities http://pastebin.com/XzzMYDjF
you might want to have a look at Apache commons StringUtils: http://commons.apache.org/lang/api-2.3/org/apache/commons/lang/StringEscapeUtils.html#unescapeHtml(java.lang.String)
with the unescapeHtml you could do sth. like:
String input = "A";
String unescaped = StringEscapeUtils.unescapeHtml(input);
boolean containsValidEscape = !input.equals(a);
Not sure if this is a perfect solution, but you can use Apache Commons Lang:
try {
return StringEscapeUtils.unescapeHtml4(code).length() < code.length();
} catch (IllegalArgumentException e) {
return false;
}
This should be the method you wanted:
public static boolean isValidHtmlEscapeCode(String string) {
String temp = "";
try {
temp = StringEscapeUtils.unescapeHtml3(string);
} catch (IllegalArgumentException e) {
return false;
}
return !string.equals(temp);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With