I have a String like -
<phone-residence></phone-residence><marital-status>1</marital-status><phone-on-request></phone-on-request>
I want to remove hyphens (-
) and uppercase the single alpha character following each removed hyphen. I.e. convert from hyphen-delimited words to "CamelCase".
Like -
<phoneResidence></phoneResidence><maritalStatus>1</maritalStatus><phoneOnRequest></phoneOnRequest>
How to do this?
Since Java 8 functional interfaces there has been a String#replaceAll()
that takes a transformation function to modify the matched subsequences "on the fly" and build the final output.
First, A Warning: Regexes are fantastic, incredibly powerful tools for a certain class of problem. Before applying regex you must determine if the problem is amenable. Ordinarily processing XML is the antithesis of a regex-amenable problem, except in this case where the goal is to treat the input as merely a string and not as XML. (However read carefully the Caveat below)
Here is a famous quote from Jamie Zawinski in 1997:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
With those caveats, here's the code for your question:
String input="<phone-residence></phone-residence><marital-status>1</marital-status><phone-on-request></phone-on-request>";
Matcher m = Pattern.compile("-[a-zA-Z]").matcher(input);
// Do all the replacements in one statement using the functional replaceAll()
String result = m.replaceAll(s -> s.group().substring(1).toUpperCase());
The regex matches a single hyphen followed by any single alphabetic character, upper or lowercase. The replaceAll()
scans the input using the Matcher
. At every match it invokes the lambda (functional shorthand for an anonymous class with a single apply()
method) passing in a String
argument containing the matched text. Whatever the lambda returns is then substituted into output string being built by the replaceAll()
method, in place of the matched string.
The solution given above is completely blind to the structure of the XML it will change any -a
combination (where a
stands for any letter) and replace it with just A
(where A
stands for an upper-case letter), regardless where it appears.
In the example you gave, this pattern occurred only in the tag names. If however, there are other parts of the file that contain (or can contain) that pattern then those instances will also be replaced. This could be a problem if that pattern occurs in text data (i.e. stuff not inside, but between the tags) or as an attribute value. This approach of applying a regex to the entire file blindly is kind of the chainsaw approach. If you really, really need a chainsaw you use it.
However, if it turns out a chainsaw is too powerful and your actual task requires more finesse, then you would need to switch to a real XML parser (the JDK includes a good one), which can handle all the subtleties. It delivers to you the various syntactic bits and pieces such as tag name, attribute names, attribute values, text, etc. separately, so that you can explicitly decide which parts are to be affected. You'd still use the replaceAll()
above but apply it only to the parts where it was needed.
Almost as a rule, you will ABSOLUTELY NOT use regexes to process XML, or parse strings containing nested or escaped quotes, or parse CSV or TSV files. Those data formats are not normally suitable domains for using regexes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With