I'm writing a script that takes an output file from another platform (that sadly doesn't produce CSV output, instead it's around 7 lines per record), grabbing the lines that have the values I'm interested in (using select-string
) and then scanning the MatchInfo
array, extracting the exact text and building an array as I go, to export to CSV when finished.
My problem is that the original file has around 94000 lines of text, and the matchinfo object still has around 23500 records in it, so it takes a while, especially building the array, so I thought I'd throw in a Write-Progress
but the overhead in doing so is quite horrific, it increases the elapsed time x8 vs not having the progress bar.
Here's an example entry from the original file:
CREATE TRANCODE MPF OF TXOLID
AGENDA = T4XCLCSHINAG
,ANY_SC_LIST = NONE ,EVERY_SC_LIST = NONE
,SECURITY_CATEGORY = NONE ,FUNCTION = 14
,TRANCODE_VALUE = "MPF"
,TRANCODE_FUNCTION_MNEMONIC = NONE
,INSTALLATION_DATA = NONE
;
Now, for each of these, I only care about the values of AGENDA
and TRANCODE_VALUE
, so having read the file in using Get-Content
, I then use Select-String
as the most efficient way I know to filter out the rest of the lines in the file:
rv Start,Filtered,count,CSV
Write-Host "Reading Mainframe Extract File"
$Start = gc K:\TRANCODES.txt
Write-Host ("Read Complete : " + $Start.Count + " records found")
Write-Host "Filtering records for AGENDA/TRANCODE information"
$Filtered = $Start|Select-String -Pattern "AGENDA","TRANCODE_VALUE"
Write-Host ([String]($Filtered.Count/2) + " AGENDA/TRANCODE pairs found")
This leaves me with an object of type Microsoft.PowerShell.Commands.MatchInfo
with contents like:
AGENDA = T4XCLCSHINAG
,TRANCODE_VALUE = "MPF"
AGENDA = T4XCLCSHINAG
,TRANCODE_VALUE = "MP"
Now that Select-String only took around 9 seconds, so no real need for a progress bar there.
However, the next step, grabbing the actual values (after the =
) and putting in an array takes over 30 seconds, so I figured a Write-Progress
is helpful to the user and at least shows that something is actually happening, but, the addition of the progress bar seriously extends the elapsed time, see the following output from Measure-Command
:
Measure-Command{$Filtered|foreach {If ($_.ToString() -Match 'AGENDA'){$obj = $null;
$obj = New-Object System.Object;
$obj | Add-Member -type NoteProperty -name AGENDA -Value $_.ToString().SubString(27)}
If ($_.ToString() -Match 'TRANCODE_VALUE'){$obj | Add-Member -type NoteProperty -name TRANCODE -Value ($_.ToString().SubString(28)).Replace('"','');
$CSV += $obj;
$obj = $null}
<#$count++
Write-Progress `
-Activity "Building table of values from filter results" `
-Status ("Processed " + $count + " of " + $Filtered.Count + " records") `
-Id 1 `
-PercentComplete ([int]($count/$Filtered.Count *100))#>
}}
TotalSeconds : 32.7902523
So that's 717.2308630680085 records/sec
Measure-Command{$Filtered|foreach {If ($_.ToString() -Match 'AGENDA'){$obj = $null;
$obj = New-Object System.Object;
$obj | Add-Member -type NoteProperty -name AGENDA -Value $_.ToString().SubString(27)}
If ($_.ToString() -Match 'TRANCODE_VALUE'){$obj | Add-Member -type NoteProperty -name TRANCODE -Value ($_.ToString().SubString(28)).Replace('"','');
$CSV += $obj;
$obj = $null}
$count++
Write-Progress `
-Activity "Building table of values from filter results" `
-Status ("Processed " + $count + " of " + $Filtered.Count + " records") `
-Id 1 `
-PercentComplete ([int]($count/$Filtered.Count *100))
}}
TotalSeconds : 261.3469632
Now only a paltry 89.98660799693897 records/sec
Any ideas how to improve the efficiency?
Here's the full script as-is:
rv Start,Filtered,count,CSV
Write-Host "Reading Mainframe Extract File"
$Start = gc K:\TRANCODES.txt
Write-Host ("Read Complete : " + $Start.Count + " records found")
Write-Host "Filtering records for AGENDA/TRANCODE information"
$Filtered = $Start|Select-String -Pattern "AGENDA","TRANCODE_VALUE"
Write-Host ([String]($Filtered.Count/2) + " AGENDA/TRANCODE pairs found")
Write-Host "Building table from the filter results"
[int]$count = 0
$CSV = @()
$Filtered|foreach {If ($_.ToString() -Match 'AGENDA'){$obj = $null;
$obj = New-Object System.Object;
$obj | Add-Member -type NoteProperty -name AGENDA -Value $_.ToString().SubString(27)}
If ($_.ToString() -Match 'TRANCODE_VALUE'){$obj | Add-Member -type NoteProperty -name TRANCODE -Value ($_.ToString().SubString(28)).Replace('"','');
$CSV += $obj;
$obj = $null}
$count++
Write-Progress `
-Activity "Building table of values from filter results" `
-Status ("Processed " + $count + " of " + $Filtered.Count + " records") `
-Id 1 `
-PercentComplete ([int]($count/$Filtered.Count *100))
}
Write-Progress `
-Activity "Building table of values from filter results" `
-Status ("Table built : " + $CSV.Count + " rows created") `
-Id 1 `
-Completed
Write-Host ("Table built : " + $CSV.Count + " rows created")
Write-Host "Sorting and Exporting table to CSV file"
$CSV|Select TRANCODE,AGENDA|Sort TRANCODE |Export-CSV -notype K:\TRANCODES.CSV
Here's output from script with the write-progress
commented out:
Reading Mainframe Extract File
Read Complete : 94082 records found
Filtering records for AGENDA/TRANCODE information
11759 AGENDA/TRANCODE pairs found
Building table from the filter results
Table built : 11759 rows created
Sorting and Exporting table to CSV file
TotalSeconds : 75.2279182
EDIT: I've adopted a modified version of the answer from @RomanKuzmin, so the appropriate code section now looks like:
Write-Host "Building table from the filter results"
[int]$count = 0
$CSV = @()
$sw = [System.Diagnostics.Stopwatch]::StartNew()
$Filtered|foreach {If ($_.ToString() -Match 'AGENDA'){$obj = $null;
$obj = New-Object System.Object;
$obj | Add-Member -type NoteProperty -name AGENDA -Value $_.ToString().SubString(27)}
If ($_.ToString() -Match 'TRANCODE_VALUE'){$obj | Add-Member -type NoteProperty -name TRANCODE -Value ($_.ToString().SubString(28)).Replace('"','');
$CSV += $obj;
$obj = $null}
$count++
If ($sw.Elapsed.TotalMilliseconds -ge 500) {
Write-Progress `
-Activity "Building table of values from filter results" `
-Status ("Processed " + $count + " of " + $Filtered.Count + " records") `
-Id 1 `
-PercentComplete ([int]($count/$Filtered.Count *100));
$sw.Reset();
$sw.Start()}
}
Write-Progress `
-Activity "Building table of values from filter results" `
-Status ("Table built : " + $CSV.Count + " rows created") `
-Id 1 `
-Completed
And running the entire script through Measure-Command
gives elapsed time of 75.2279182 seconds with no write-progress
and with the modified write-progress
using @RomanKuzmin suggestion, 76.525382 seconds - not bad at all!! :-)
In such cases when progress is called too often I use this approach
# fast even with Write-Progress
$sw = [System.Diagnostics.Stopwatch]::StartNew()
for($e = 0; $e -lt 1mb; ++$e) {
if ($sw.Elapsed.TotalMilliseconds -ge 500) {
Write-Progress -Activity Test -Status "Done $e"
$sw.Reset(); $sw.Start()
}
}
# very slow due to Write-Progress
for($e = 0; $e -lt 1mb; ++$e) {
Write-Progress -Activity Test -Status "Done $e"
}
Here is the suggestion on Connect....
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With