Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex best-practices

Tags:

java

regex

I'm just learning how to use regex's:

I'm reading in a text file that is split into sections of two different sorts, demarcated by <:==]:> and <:==}:> . I need to know for each section whether it's a ] or } , so I can't just do

pattern.compile("<:==]:>|<:==}:>"); pattern.split(text)

Doing this:

pattern.compile("<:=="); pattern.split(text)

works, and then I can just look at the first char in each substring, but this seems sloppy to me, and I think I'm only resorting to it because I'm not fully grasping something I need to grasp about regex's:

What would be the best practice here? Also, is there any way to split a string up while leaving the delimiter in the resulting strings- such that each begins with the delimiter?

EDIT: the file is laid out like this:

Old McDonald had a farm 
<:==}:> 
EIEIO. And on that farm he had a cow 
<:==]:> 
And on that farm he....
like image 560
drew moore Avatar asked Nov 22 '13 11:11

drew moore


1 Answers

It may be a better idea not to use split() for this. You could instead do a match:

List<String> delimList = new ArrayList<String>();
List<String> sectionList = new ArrayList<String>();
Pattern regex = Pattern.compile(
    "(<:==[\\]}]:>)     # Match a delimiter, capture it in group 1.\n" +
    "(                  # Match and capture in group 2:\n" +
    " (?:               # the following group which matches...\n" +
    "  (?!<:==[\\]}]:>) # (unless we're at the start of another delimiter)\n" +
    "  .                # any character\n" +
    " )*                # any number of times.\n" +
    ")                  # End of group 2", 
    Pattern.COMMENTS | Pattern.DOTALL);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    delimList.add(regexMatcher.group(1));
    sectionList.add(regexMatcher.group(2));
} 
like image 195
Tim Pietzcker Avatar answered Oct 16 '22 08:10

Tim Pietzcker