Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove Hebrew vowels (nikkud) from selected Unicode Hebrew text

I want to select a string of Unicode Hebrew text in a Word document and remove the Hebrew vowels (aka nikkud) without changing anything else.

I need to remove Unicode characters in a given range from the selected text. The Unicode characters I want to remove are U+0591-U+05BD, U+05BF-U+05C2, and U+05C4-U+05C7.

I found a way to remove the Hebrew vowels from a Unicode text string using the REGEXREPLACE function in Google Sheets (thank you GitHub). E.g:

=REGEXREPLACE(B1,"[(\x{0591}-\x{05BD})OR(\x{05BF}-\x{05C2})OR(\x{05C4}-\x{05C7})]","")

where cell B1 contains the original Hebrew text with vowels, and the function outputs the identical text with the vowels removed. The Unicode range used there permits me to leave two characters that need to remain (U+05BE and U+05C3).

Using that method, I can copy a Hebrew text string, e.g., אָמַר יְהוָה, paste it into my Google Sheet, and then copy the output, אמר יהוה, and paste it over the original text. This is much slower than a macro in Word would be (there are hundreds of these Hebrew text strings that need to be fixed). The majority of the document is in English, with snippets of Hebrew, so I don't need a solution for converting a whole document.

A bit of searching suggests to me that a similar RegEx replace function exists for Word VBA, but I don't have sufficient programming knowledge to adapt this to my own needs.

like image 254
Jonathan Potter Avatar asked Jun 13 '18 02:06

Jonathan Potter


1 Answers

You can try this Macro. Be warned, it's very slow on my end:

Sub RemoveHebrewVowels()
    Dim Word As Range
    Dim Words As Variant
    Dim WildcardCollection(3) As String
    Rem [(\x{0591}-\x{05BD}]
    WildcardCollection(0) = "[" & ChrW(1425) & "-" & ChrW(1469) & "]{1;}"
    Rem [\x{05BF}-\x{05C2}]
    WildcardCollection(1) = "[" & ChrW(1471) & "-" & ChrW(1474) & "]{1;}"
    Rem [\x{05C4}-\x{05C7}]
    WildcardCollection(2) = "[" & ChrW(1476) & "-" & ChrW(1479) & "]{1;}"
    'Options.DefaultHighlightColorIndex = wdYellow
    'Clear existing formatting and settings in Find
    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
    'Selection.Find.Replacement.Highlight = True
    'Cycle through document and find wildcards patterns, replace when found
    For Each Word In ActiveDocument.Words
        For Each WildcardsPattern In WildcardCollection
            With Selection.Find
                .Text = WildcardsPattern
                .Replacement.Text = ""
                .Forward = True
                .Wrap = wdFindContinue
                .Format = False
                .MatchCase = False
                .MatchWholeWord = False
                .MatchWildcards = True
                .MatchSoundsLike = False
                .MatchAllWordForms = False
            End With
            Selection.Find.Execute Replace:=wdReplaceAll
        Next
    Next
End Sub
like image 157
wp78de Avatar answered Nov 15 '22 08:11

wp78de