Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java escape HTML - string replace slow?

Tags:

java

I have a Java application that makes heavy use of a large file, to read, process and give through to SolrEmbeddedServer (http://lucene.apache.org/solr/).

One of the functions does basic HTML escaping:

private String htmlEscape(String input)
{
    return input.replace("&", "&amp;").replace(">", "&gt;").replace("<", "&lt;")
        .replace("'", "&apos;").replaceAll("\"", "&quot;");
}

While profiling the application, the program spends roughly 58% of the time in this function, a total of 47% in replace, and 11% in replaceAll.

Now, is the Java replace that slow, or am I on the right path and should I consider the program efficient enough to have its bottleneck in Java and not in my code? (Or am I replacing wrong?)

Thanks in advance!

like image 978
cpf Avatar asked Dec 30 '25 19:12

cpf


2 Answers

For html escaping you can use StringEscapeUtils.escapeHtml(input) from commons-lang. It is supposedly implemented in a more efficient way there.

like image 60
Bozho Avatar answered Jan 01 '26 07:01

Bozho


This is certainly not the most efficient way to do a lot of replacements. Since strings are immutable, each .replace() leads to the construction of a new String object. For the example you give, each call to this function leads to the temporary creation of 6 String objects.

Considering the example you give, the simplest solution is to use an existing library function for HTML entity encoding. Apache commons StringEscapeUtils is one option. Another one is HTMLEntities

like image 39
amarillion Avatar answered Jan 01 '26 09:01

amarillion



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!