Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to improve the performance of Write-Progress?

I'm writing a script that takes an output file from another platform (that sadly doesn't produce CSV output, instead it's around 7 lines per record), grabbing the lines that have the values I'm interested in (using select-string) and then scanning the MatchInfo array, extracting the exact text and building an array as I go, to export to CSV when finished.

My problem is that the original file has around 94000 lines of text, and the matchinfo object still has around 23500 records in it, so it takes a while, especially building the array, so I thought I'd throw in a Write-Progress but the overhead in doing so is quite horrific, it increases the elapsed time x8 vs not having the progress bar.

Here's an example entry from the original file:

CREATE   TRANCODE   MPF               OF TXOLID
AGENDA                   = T4XCLCSHINAG
,ANY_SC_LIST              = NONE    ,EVERY_SC_LIST            = NONE
,SECURITY_CATEGORY        = NONE    ,FUNCTION                 = 14
,TRANCODE_VALUE           = "MPF"
,TRANCODE_FUNCTION_MNEMONIC = NONE
,INSTALLATION_DATA        = NONE
;

Now, for each of these, I only care about the values of AGENDA and TRANCODE_VALUE, so having read the file in using Get-Content, I then use Select-String as the most efficient way I know to filter out the rest of the lines in the file:

rv Start,Filtered,count,CSV
Write-Host "Reading Mainframe Extract File"
$Start = gc K:\TRANCODES.txt
Write-Host ("Read Complete : " + $Start.Count + " records found")

Write-Host "Filtering records for AGENDA/TRANCODE information"
$Filtered = $Start|Select-String -Pattern "AGENDA","TRANCODE_VALUE"
Write-Host ([String]($Filtered.Count/2) + " AGENDA/TRANCODE pairs found")

This leaves me with an object of type Microsoft.PowerShell.Commands.MatchInfo with contents like:

AGENDA                   = T4XCLCSHINAG
,TRANCODE_VALUE           = "MPF"
AGENDA                   = T4XCLCSHINAG
,TRANCODE_VALUE           = "MP"

Now that Select-String only took around 9 seconds, so no real need for a progress bar there.

However, the next step, grabbing the actual values (after the =) and putting in an array takes over 30 seconds, so I figured a Write-Progress is helpful to the user and at least shows that something is actually happening, but, the addition of the progress bar seriously extends the elapsed time, see the following output from Measure-Command:

Measure-Command{$Filtered|foreach {If ($_.ToString() -Match 'AGENDA'){$obj = $null;  
                                    $obj = New-Object System.Object;  
                                    $obj | Add-Member -type NoteProperty -name AGENDA -Value $_.ToString().SubString(27)}  
            If ($_.ToString() -Match 'TRANCODE_VALUE'){$obj | Add-Member -type NoteProperty -name TRANCODE -Value ($_.ToString().SubString(28)).Replace('"','');
                                            $CSV += $obj;
                                            $obj = $null}
                           <#$count++     
                           Write-Progress `
                           -Activity "Building table of values from filter results" `
                           -Status ("Processed " + $count + " of " + $Filtered.Count + " records") `
                           -Id 1 `
                           -PercentComplete ([int]($count/$Filtered.Count *100))#>
                        }}



TotalSeconds      : 32.7902523   

So that's 717.2308630680085 records/sec

Measure-Command{$Filtered|foreach {If ($_.ToString() -Match 'AGENDA'){$obj = $null;  
                                    $obj = New-Object System.Object;  
                                    $obj | Add-Member -type NoteProperty -name AGENDA -Value $_.ToString().SubString(27)}  
            If ($_.ToString() -Match 'TRANCODE_VALUE'){$obj | Add-Member -type NoteProperty -name TRANCODE -Value ($_.ToString().SubString(28)).Replace('"','');
                                            $CSV += $obj;
                                            $obj = $null}
                           $count++     
                           Write-Progress `
                           -Activity "Building table of values from filter results" `
                           -Status ("Processed " + $count + " of " + $Filtered.Count + " records") `
                           -Id 1 `
                           -PercentComplete ([int]($count/$Filtered.Count *100))
                        }}


TotalSeconds      : 261.3469632  

Now only a paltry 89.98660799693897 records/sec

Any ideas how to improve the efficiency?

Here's the full script as-is:

rv Start,Filtered,count,CSV  
Write-Host "Reading Mainframe Extract File"  
$Start = gc K:\TRANCODES.txt  
Write-Host ("Read Complete : " + $Start.Count + " records found")  

Write-Host "Filtering records for AGENDA/TRANCODE information"  
$Filtered = $Start|Select-String -Pattern "AGENDA","TRANCODE_VALUE"  
Write-Host ([String]($Filtered.Count/2) + " AGENDA/TRANCODE pairs found")  

Write-Host "Building table from the filter results"  
[int]$count = 0  
$CSV = @()  
$Filtered|foreach {If ($_.ToString() -Match 'AGENDA'){$obj = $null;  
                                    $obj = New-Object System.Object;  
                                    $obj | Add-Member -type NoteProperty -name AGENDA -Value $_.ToString().SubString(27)}  
            If ($_.ToString() -Match 'TRANCODE_VALUE'){$obj | Add-Member -type NoteProperty -name TRANCODE -Value ($_.ToString().SubString(28)).Replace('"','');  
                                            $CSV += $obj;  
                                            $obj = $null}  
                           $count++     
                           Write-Progress `  
                           -Activity "Building table of values from filter results" `  
                           -Status ("Processed " + $count + " of " + $Filtered.Count + " records") `  
                           -Id 1 `  
                           -PercentComplete ([int]($count/$Filtered.Count *100))  
                        }  
     Write-Progress `  
     -Activity "Building table of values from filter results" `  
     -Status ("Table built : " + $CSV.Count + " rows created") `  
     -Id 1 `  
     -Completed  

Write-Host ("Table built : " + $CSV.Count + " rows created")  

Write-Host "Sorting and Exporting table to CSV file"  

$CSV|Select TRANCODE,AGENDA|Sort TRANCODE |Export-CSV -notype K:\TRANCODES.CSV  

Here's output from script with the write-progress commented out:

Reading Mainframe Extract File
Read Complete : 94082 records found
Filtering records for AGENDA/TRANCODE information
11759 AGENDA/TRANCODE pairs found
Building table from the filter results
Table built : 11759 rows created
Sorting and Exporting table to CSV file

TotalSeconds      : 75.2279182  

EDIT: I've adopted a modified version of the answer from @RomanKuzmin, so the appropriate code section now looks like:

Write-Host "Building table from the filter results"
[int]$count = 0
$CSV = @()
$sw = [System.Diagnostics.Stopwatch]::StartNew()
$Filtered|foreach {If ($_.ToString() -Match 'AGENDA'){$obj = $null;
                                        $obj = New-Object System.Object;
                                        $obj | Add-Member -type NoteProperty -name AGENDA -Value $_.ToString().SubString(27)}
                If ($_.ToString() -Match 'TRANCODE_VALUE'){$obj | Add-Member -type NoteProperty -name TRANCODE -Value ($_.ToString().SubString(28)).Replace('"','');
                                                $CSV += $obj;
                                                $obj = $null}
                               $count++     
                               If ($sw.Elapsed.TotalMilliseconds -ge 500) {
                               Write-Progress `
                               -Activity "Building table of values from filter results" `
                               -Status ("Processed " + $count + " of " + $Filtered.Count + " records") `
                               -Id 1 `
                               -PercentComplete ([int]($count/$Filtered.Count *100));
                               $sw.Reset();
                               $sw.Start()}
                            }
         Write-Progress `
         -Activity "Building table of values from filter results" `
         -Status ("Table built : " + $CSV.Count + " rows created") `
         -Id 1 `
         -Completed

And running the entire script through Measure-Command gives elapsed time of 75.2279182 seconds with no write-progress and with the modified write-progress using @RomanKuzmin suggestion, 76.525382 seconds - not bad at all!! :-)

like image 726
Graham Gold Avatar asked Jan 23 '14 09:01

Graham Gold


1 Answers

In such cases when progress is called too often I use this approach

# fast even with Write-Progress
$sw = [System.Diagnostics.Stopwatch]::StartNew()
for($e = 0; $e -lt 1mb; ++$e) {
    if ($sw.Elapsed.TotalMilliseconds -ge 500) {
        Write-Progress -Activity Test -Status "Done $e"
        $sw.Reset(); $sw.Start()
    }
}

# very slow due to Write-Progress
for($e = 0; $e -lt 1mb; ++$e) {
    Write-Progress -Activity Test -Status "Done $e"
}

Here is the suggestion on Connect....

like image 65
Roman Kuzmin Avatar answered Sep 28 '22 16:09

Roman Kuzmin