I'm building a very trivial template processor. It will only be able to substitute values of variables.
I thought I would first decompose the string into parts (constant parts and variable references). I would then replace all variable references with the corresponding values. Finally I would concatenate all the parts back together.
In order to decompose the string, I would need to slice it in the following way.
A string like this
"UPDATE {ix:tablename} SET value = value + 1 WHERE {ix:column} = {ix:value}"
should result in the following array
[
"UPDATE ",
"{ix:tablename}",
" SET value = value + 1 WHERE ",
"{ix:column}",
" = ",
"{ix:value}"
]
I know this could be done by repeatedly searching for the first opening bracket, and then the first closing bracket, aso. But isn't there a more elegant solution than that (some regexp magic, perhaps?).
You can get the array you want with a regex split:
MyString.split("(?=\\{ix:)|(?<=\\})")
(The { and } need to be escaped as \{ and \} to be literal in regex, and because it's a Java string those \s needs to be further escaped as \\.)
i.e. Lookahead for {ix: or lookbehind for } and split at that position if either found.
If it is possible for } to be valid in other contexts, I would probably take a different approach.
An often forgotton aspect of regex, particularly when it comes to splitting, is that it can match positions, also known as zero-width matches.
Most people are familiar with positional matches, like ^ and \b, but fewer are well acquainted with lookarounds, which allow ad-hoc conditions to be specified.
When a regex contains only positional matching constructs, although there are no characters included in the match, regex still records the position where the match occurred - most string operations just need a position and a length, and a length of 0 still allows a split (or replace) to occur at the designated position.
Lookaheads and lookbehinds allow you to match positions by specifying sub-expressions which are checked going forwards (ahead) and backwards (behind) in the string from the position they are being tested at.
In syntax terms, a lookahead looks like (?=subexpr) whilst a lookbehind looks like (?<=subexpr).
There are negated versions - for when the pattern must fail to be considered successful - which are (?!subexpr) and (?<!subexpr) respectively.
Lookarounds are non-capturing - their match is not placed in backreference groups like a standard (group), though they can contain backreferences.
Lookbehinds in Java* have a restriction that they cannot be unlimited length - so you can't do (?<=\w+) but instead need to use numeric quantifiers with an upper bounds, e.g. (?<=\w{1,99}
(*A couple of regex implementations don't have this restriction; though many have a stricter restriction of being fixed length.)
Lookaheads do not have such a restriction (though of course for performance reasons you should limit them to matching only what is required).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With