Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Powershell search matching string in word document

I have a simple requirement. I need to search a string in Word document and as result I need to get matching line / some words around in document.

So far, I could successfully search a string in folder containing Word documents but it returns True / False based on whether it could find search string or not.

#ERROR REPORTING ALL
Set-StrictMode -Version latest
$path     = "c:\MORLAB"
$files    = Get-Childitem $path -Include *.docx,*.doc -Recurse | Where-Object { !($_.psiscontainer) }
$output   = "c:\wordfiletry.txt"
$application = New-Object -comobject word.application
$application.visible = $False
$findtext = "CRHPCD01"

Function getStringMatch
{
  # Loop through all *.doc files in the $path directory
  Foreach ($file In $files)
  {
   $document = $application.documents.open($file.FullName,$false,$true)
   $range = $document.content
   $wordFound = $range.find.execute($findText)

   if($wordFound) 
    { 
     "$file.fullname has $wordfound" | Out-File $output -Append
    }

  }
$document.close()
$application.quit()
}

getStringMatch
like image 682
Yogesh Avatar asked Nov 27 '14 11:11

Yogesh


1 Answers

#ERROR REPORTING ALL
Set-StrictMode -Version latest
$path     = "c:\Temp"
$files    = Get-Childitem $path -Include *.docx,*.doc -Recurse | Where-Object { !($_.psiscontainer) }
$output   = "c:\temp\wordfiletry.csv"
$application = New-Object -comobject word.application
$application.visible = $False
$findtext = "First"
$charactersAround = 30
$results = @{}

Function getStringMatch
{
    # Loop through all *.doc files in the $path directory
    Foreach ($file In $files)
    {
        $document = $application.documents.open($file.FullName,$false,$true)
        $range = $document.content

        If($range.Text -match ".{$($charactersAround)}$($findtext).{$($charactersAround)}"){
             $properties = @{
                File = $file.FullName
                Match = $findtext
                TextAround = $Matches[0] 
             }
             $results += New-Object -TypeName PsCustomObject -Property $properties
        }
    }

    If($results){
        $results | Export-Csv $output -NoTypeInformation
    }

    $document.close()
    $application.quit()
}

getStringMatch

import-csv $output

There are a couple of ways to get what you want. A simple approach is since you have the text of the document already lets perform a regex match on it and return the results and more. This helps in trying to address getting some words around in document.

We have the variable $charactersAround which sets the number of characters to match around the $findtext. Also I though the output was a better fit for a CSV file so I used $results to capture a hashtable of properties that, in the end, are output to a csv file.

Be sure to change the variables for your own testing. Now that we are using regex to locate the matches this opens up a world of possibilities.

Sample Output

Match TextAround                                                        File                          
----- ----------                                                        ----                          
First dley Air Services Limited dba First Air meets or exceeds all term C:\Temp\20120315132117214.docx
like image 162
Matt Avatar answered Sep 27 '22 23:09

Matt