Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression to match all characters up to next match

I'm parsing text that is many repetitions of a simple pattern. The text is in the format of a script for a play, like this:

SAMPSON
I mean, an we be in choler, we'll draw.

GREGORY
Ay, while you live, draw your neck out o' the collar.

I'm currently using the pattern ([A-Z0-9\s]+)\s*\:?\s*[\r\n](.+)[\r\n]{2}, which works fine (explanation below) except for when the character's speech has line breaks in it. When that happens, the character's name is captured successfully but only the first line of the speech is captured.

Turning on Single-line mode (to include line breaks in .) just creates one giant match.

How can I tell the (.+) to stop when it finds the next character name and end the match?
I'm iterating over each match individually (JavaScript), so the name must be available to the next match.

Ideally, I would be able to match all characters until the entire pattern is repeated.


Pattern explained:

The first group matches a character's name (allowing capital letters, numbers, and whitespace), (with a trailing colon and whitespace optional).
The second group (character's speech) begins on a new line and captures any characters (except, problematically, line breaks and characters after them).
The pattern ends (and starts over) after a blank line.

like image 741
Nathan Avatar asked Nov 13 '22 06:11

Nathan


1 Answers

Consider going a different direction with this. You really want to split a larger dialogue on any line that contains a name. You can do this with a regular expression still (replace the regex with whatever will match the "speaker" line):

results = "Insert script here".split(/^([A-Z]+)$/)

On a standards compliant implementation, you example text will end up in an array like so:

results[0] = ""
results[1] = "SAMPSON"      
results[2] = "I mean, an we be in choler, we'll draw.            
"
results[3] = "GREGORY"      
results[4] = "Ay, while you live, draw your neck out o' the collar. "

A caveat is that most browsers are spotty on the standard here. You can use the library XRegExp to get cross platform behaviour.

like image 162
Chris Pitman Avatar answered Nov 16 '22 02:11

Chris Pitman