Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Capture string in regex replacement

From what I can gather from the Pharo documentation on regex, I can define a regular expression object such as:

re := '(foo|re)bar' asRegex

And I can replace the matched regex with a string via this:

re copy: 'foobar blah rebar' replacingMatchesWith: 'meh'

Which will result in: `'meh blah meh'.

So far, so good. But I want to replace the 'bar' and leave the prefix alone. Therefore, I need a variable to handle the captured parenthetical:

re copy: 'foobar blah rebar' replacingMatchesWith: '%1meh'

And I want the result: 'foomeh blah remeh'. However, this just gives me: '%1meh blah %1meh'. I also tried using \1, or \\1, or $1, or {1} and got the literal string replacement, e.g., '\1meh blah \1meh' as a result.

I can do this easily enough in GNU Smalltalk with:

'foobar blah rebar' replacingAllRegex: '(foo|re)bar' with: '%1meh'

But I can't find anywhere in the Pharo regex documentation that tells me how I can do this in Pharo. I've done a bunch of googling for Pharo regex as well, but not turned up anything. Is this capability part of the RxMatcher class or some other Pharo regex class?

like image 470
lurker Avatar asked May 24 '16 01:05

lurker


People also ask

Does string replace take regex?

The Regex. Replace(String, MatchEvaluator, Int32, Int32) method is useful for replacing a regular expression match if any of the following conditions is true: The replacement string cannot readily be specified by a regular expression replacement pattern.

How do I replace a word in a string in regex?

To use RegEx, the first argument of replace will be replaced with regex syntax, for example /regex/ . This syntax serves as a pattern where any parts of the string that match it will be replaced with the new substring. The string 3foobar4 matches the regex /\d. *\d/ , so it is replaced.

How do you capture a new line in regex?

"\n" matches a newline character.

How do I capture a word in regex?

To run a “whole words only” search using a regular expression, simply place the word between two word boundaries, as we did with ‹ \bcat\b ›. The first ‹ \b › requires the ‹ c › to occur at the very start of the string, or after a nonword character.


1 Answers

After experimenting a bit with the RxMatcher class, I made the following modification to the RxMatcher#copyStream:to:replacingMatchesWith: selector:

copyStream: aStream to: writeStream replacingMatchesWith: aString
    "Copy the contents of <aStream> on the <writeStream>,
     except for the matches. Replace each match with <aString>."

    | searchStart matchStart matchEnd |
    stream := aStream.
    markerPositions := nil.
    [searchStart := aStream position.
    self proceedSearchingStream: aStream] whileTrue: [ | ws rep |
        matchStart := (self subBeginning: 1) first.
        matchEnd := (self subEnd: 1) first.
        aStream position: searchStart.
        searchStart to: matchStart - 1 do:
            [:ignoredPos | writeStream nextPut: aStream next].

        "------- The following lines replaced: writeStream nextPutAll: aString ------"
        "Do the regex replacement including lookback substitutions"
        writeStream nextPutAll: (aString format: self subexpressionStrings).
        "-------"

        aStream position: matchEnd.
        "Be extra careful about successful matches which consume no input.
        After those, make sure to advance or finish if already at end."
        matchEnd = searchStart ifTrue: 
            [aStream atEnd
                ifTrue: [^self "rest after end of whileTrue: block is a no-op if atEnd"]
                ifFalse:    [writeStream nextPut: aStream next]]].
    aStream position: searchStart.
    [aStream atEnd] whileFalse: [writeStream nextPut: aStream next]

And then "accessing" category:

subexpressionStrings
   "Create an array of lookback strings"
   | ws |
   ws := Array new writeStream.
   2 to: (self subexpressionCount) do: [ :n | | se |
      ws nextPut: ((se := self subexpression: n) ifNil: [ '' ] ifNotNil: [ se ]) ].
   ^ws contents.

With this modification, I can do a lookback in the replacement string using the Smalltalk String#format: pattern for arguments:

re := '((foo|re)ba(r|m))' asRegex
re copy: 'foobar meh rebam' replacingMatchesWith: '{2}bu{3} (was {1})'

Results in:

'foobur (was foobar) meh rebum (was rebam)'
like image 71
lurker Avatar answered Sep 21 '22 07:09

lurker