It's well known that word's find and replace "wildcards" features suffer some severe limitations.
The following code implements true regex find and replace in a word document, and variations on it are found in other Stackoverflow and SuperUser questions.
Sub RegEx_PlainText(Before As String, After As String)
Dim regexp As Object
Set regexp = CreateObject("vbscript.regexp")
With regexp
.Pattern = Before
.IgnoreCase = True
.Global = True
'could be any Range , .Range.Text , or selection object
ActiveDocument.Range = .Replace(ActiveDocument.Range, After)
End With
End Sub
However, this wipes the document of all formatting.
Word will not preserve formatting character by character even if the strings are of the same length or indeed the same string, so ActiveDocument.Range = ActiveDocument.Range
or Selection.Text=Selection.Text
will wipe all formatting (or more accurately, format the whole range the same as the first character in the range, and add a carriage return). Upon reflection, this behavior isn't so surprising.
To solve this, the following code runs a regex find, then loops through the matches and runs .replace
only on the range where the match is found. This then, would only lose formatting if the match iself had a variety of formatting (for example an italicised word would be lost)
Hopefully the code comments make this quite transparent.
Sub RegEx(Before As String, After As String, _
Optional CaseSensitive As Boolean = False, _
Optional Location As Range = Nothing, _
Optional DebugMode As Boolean = False)
'can't declare activedocument.range in parameters
If Location Is Nothing Then Set Location = ActiveDocument.Range
Dim regexp As Object
Dim Foundmatches As Object
Dim Match As Object
Dim MatchRange As Range
Dim offset As Integer: offset = 0
Set regexp = CreateObject("vbscript.regexp")
With regexp
.Pattern = Before
.IgnoreCase = Not CaseSensitive
.Global = True
'set foundmatches to collection of all regex matches
Set Foundmatches = .Execute(Location.text)
For Each Match In Foundmatches
'set matchrange to location of found string in source doc.
'offset accounts for change in length of document from already completed replacements
Set MatchRange = Location.Document _
.Range(Match.FirstIndex + offset, _
Match.FirstIndex + Match.Length + offset)
'debugging
If DebugMode Then
Debug.Print "strfound = " & Match.Value
Debug.Print "matchpoint = " & Match.FirstIndex
Debug.Print "origstrlength = " & Match.Length
Debug.Print "offset = " & offset
Debug.Print "matchrange = " & MatchRange.text
MatchRange.Select
Stop
Else
'REAL LIFE
'run the regex replace just on the range containing the regex match
MatchRange = .Replace(MatchRange, After)
'increment offset to account for change in length of document
offset = offset + MatchRange.End - MatchRange.Start - Match.Length
End If
Next
End With
End Sub
This works on simple documents, but when I run it on a real document, matchrange
ends up being at some point near the where the match was found, but not exactly right. It's not predictably off, sometimes it is to the right, and sometimes to the left. Generally the more complex the document. (links, tables of context, formatting etc.) the more wrong it ends up being.
Does anyone know why this doesn't work, and how to fix it? If I could understand why this isn't working, then I might be able to determine whether this approach can be fixed, or if I just need to try a different method.
Code includes DebugMode param which means it will just loop through the doc and highlight all matches, performing no changes. Also outputs a bunch of stuff to the console. This should be helpful for anyone kind enough to tackle this problem with me.
https://calibre-ebook.com/downloads/demos/demo.docx Here is a sample document (not produced by me) which may be useful.
@Some_Guy: thanks for asking this question, I had a similar problem and your post saved me quite a bit of time.
This is the kludge I came up with:
Sub RegEx(Before As String, After As String, _
Optional CaseSensitive As Boolean = False, _
Optional Location As Range = Nothing, _
Optional DebugMode As Boolean = False)
'can't declare activedocument.range in parameters
If Location Is Nothing Then Set Location = ActiveDocument.Range
Dim j As Long
Dim regexp As Object
Dim Foundmatches As Object
Dim Match As Object
Dim MatchRange As Range
Dim offset As Integer: offset = 0
Set regexp = CreateObject("vbscript.regexp")
With regexp
.Pattern = Before
.IgnoreCase = Not CaseSensitive
.Global = True
'set foundmatches to collection of all regex matches
Set Foundmatches = .Execute(Location.Text)
For j = Foundmatches.Count - 1 To 0 Step -1
If DebugMode = True Then
'debugging
Debug.Print Foundmatches(j), .Replace(Foundmatches(j), After)
Else
'REAL LIFE
'run a plain old find/replace on the found string and eplace strings
With ActiveDocument.Range.Find
.ClearFormatting
.Replacement.ClearFormatting
.Replacement.Font.Hidden = True
.Text = Foundmatches(j)
.Replacement.Text = regexp.Replace(Foundmatches(j), After)
.Execute Replace:=wdReplaceAll
End With
End If
Next j
End With
End Sub
Basically I use a simple find/replace with strings that match each item found (and would be replaced) with a regex, would decent support for it exist in Word). Note that any text replaced takes on the formatting of the first replaced character, so if the first word is in bold, then all the replaced text will be bold.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With