Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Template processing - find variable references inside a string

Tags:

java

string

regex

I'm building a very trivial template processor. It will only be able to substitute values of variables.

I thought I would first decompose the string into parts (constant parts and variable references). I would then replace all variable references with the corresponding values. Finally I would concatenate all the parts back together.


In order to decompose the string, I would need to slice it in the following way.

A string like this

"UPDATE {ix:tablename} SET value = value + 1 WHERE {ix:column} = {ix:value}"

should result in the following array

[
  "UPDATE ",
  "{ix:tablename}",
  " SET value = value + 1 WHERE ",
  "{ix:column}",
  " = ",
  "{ix:value}"
]

I know this could be done by repeatedly searching for the first opening bracket, and then the first closing bracket, aso. But isn't there a more elegant solution than that (some regexp magic, perhaps?).

like image 936
Dušan Rychnovský Avatar asked Mar 22 '26 01:03

Dušan Rychnovský


1 Answers

You can get the array you want with a regex split:

MyString.split("(?=\\{ix:)|(?<=\\})")

(The { and } need to be escaped as \{ and \} to be literal in regex, and because it's a Java string those \s needs to be further escaped as \\.)

i.e. Lookahead for {ix: or lookbehind for } and split at that position if either found.

If it is possible for } to be valid in other contexts, I would probably take a different approach.

Lookarounds

An often forgotton aspect of regex, particularly when it comes to splitting, is that it can match positions, also known as zero-width matches.

Most people are familiar with positional matches, like ^ and \b, but fewer are well acquainted with lookarounds, which allow ad-hoc conditions to be specified.

When a regex contains only positional matching constructs, although there are no characters included in the match, regex still records the position where the match occurred - most string operations just need a position and a length, and a length of 0 still allows a split (or replace) to occur at the designated position.

Lookaheads and lookbehinds allow you to match positions by specifying sub-expressions which are checked going forwards (ahead) and backwards (behind) in the string from the position they are being tested at.

In syntax terms, a lookahead looks like (?=subexpr) whilst a lookbehind looks like (?<=subexpr).

There are negated versions - for when the pattern must fail to be considered successful - which are (?!subexpr) and (?<!subexpr) respectively.

Lookarounds are non-capturing - their match is not placed in backreference groups like a standard (group), though they can contain backreferences.

Lookbehind Restrictions

Lookbehinds in Java* have a restriction that they cannot be unlimited length - so you can't do (?<=\w+) but instead need to use numeric quantifiers with an upper bounds, e.g. (?<=\w{1,99}

(*A couple of regex implementations don't have this restriction; though many have a stricter restriction of being fixed length.)

Lookaheads do not have such a restriction (though of course for performance reasons you should limit them to matching only what is required).

like image 76
Peter Boughton Avatar answered Mar 24 '26 14:03

Peter Boughton



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!