I'm just learning how to use regex's:
I'm reading in a text file that is split into sections of two different sorts, demarcated by
<:==]:>
and <:==}:>
. I need to know for each section whether it's a ]
or }
, so I can't just do
pattern.compile("<:==]:>|<:==}:>"); pattern.split(text)
Doing this:
pattern.compile("<:=="); pattern.split(text)
works, and then I can just look at the first char in each substring, but this seems sloppy to me, and I think I'm only resorting to it because I'm not fully grasping something I need to grasp about regex's:
What would be the best practice here? Also, is there any way to split a string up while leaving the delimiter in the resulting strings- such that each begins with the delimiter?
EDIT: the file is laid out like this:
Old McDonald had a farm
<:==}:>
EIEIO. And on that farm he had a cow
<:==]:>
And on that farm he....
It may be a better idea not to use split()
for this. You could instead do a match:
List<String> delimList = new ArrayList<String>();
List<String> sectionList = new ArrayList<String>();
Pattern regex = Pattern.compile(
"(<:==[\\]}]:>) # Match a delimiter, capture it in group 1.\n" +
"( # Match and capture in group 2:\n" +
" (?: # the following group which matches...\n" +
" (?!<:==[\\]}]:>) # (unless we're at the start of another delimiter)\n" +
" . # any character\n" +
" )* # any number of times.\n" +
") # End of group 2",
Pattern.COMMENTS | Pattern.DOTALL);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
delimList.add(regexMatcher.group(1));
sectionList.add(regexMatcher.group(2));
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With