Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need to make a PowerShell script faster

I taught my self Powershell so I do not know everything about it.

I need to search a database with the exact amount of lines I have put in (the database is predefined), it contains > 11800 entries.

Can you please help me find what is making this slow?

Code:

$Dict = Get-Content "C:\Users\----\Desktop\Powershell Program\US.txt"

if($Right -ne "") {
    $Comb = $Letter + $Right
    $total = [int]0    
    $F = ""

    do {
        $F = $Dict | Select-Object -Index $total
        if($F.Length -eq $Num) {
            if($F.Chars("0") + $F.Chars("1") -eq $Comb) {
                Add-Content "C:\Users\----\Desktop\Powershell Program\Results.txt" "$F"
            }
        }
        $total++
        Write-Host $total
    } until([int]$total -gt [int]118619)

    $total = [int]0
    $F = ""
}

How do I speed this line by line searching/matching process up? Do I do by multi-threading? If so how?

like image 645
Katie Rivera Avatar asked Dec 06 '15 03:12

Katie Rivera


2 Answers

It seems like you've known at least one other language before powershell, and are starting out by basically replicating what you might have done in another language in this one. That's a great way to learn a new language, but of course in the beginning you might end up with methods that are a bit strange or not performant.

So first I want to break down what your code is actually doing, as a rough overview:

  1. Read every line of the file at once and store it in the $Dict variable.
  2. Loop the same number of times as there are lines.
  3. In each iteration of the loop:
    1. Get the single line that matches the loop iteration (essentially through another iteration, rather than indexing, more on that later).
    2. Get the first character of the line, then the second, then combine them.
    3. If that's equal to a pre-determined string, append this line to a text file.

Step 3-1 is what's really slowing this down

To understand why, you need to know a little bit about pipelines in PowerShell. Cmdlets that accept and work on pipelines take one or more objects, but they process a single object at a time. They don't even have access to the rest of the pipeline.

This is also true for the Select-Object cmdlet. So when you take an array with 18,500 objects in it, and pipe it into Select-Object -Index 18000, you need to send in 17,999 objects for inspection/processing before it can give you the one you want. You can see how the time taken would get longer and longer the larger the index is.

Since you already have an array, you directly access any array member by index with square brackets [] like so:

$Dict[18000]

For a given array, that takes the same amount of time no matter what the index is.

Now for a single call to Select-Object -Index you probably aren't going to notice how long it takes, even with a very large index; the problem is that you're looping through the entire array already, so this is compounding greatly.

You're essentially having to do the sum of 1..18000 which is about or approximately 162,000,000 iterations! (thanks to user2460798 for correcting my math)

Proof

I tested this. First, I created an array with 19,000 objects:

$a = 1..19000 | %{"zzzz~$_"}

Then I measured both methods of accessing it. First, with select -index:

measure-command { 1..19000 | % { $a | select -Index ($_-1 ) } | out-null }

Result:

TotalMinutes      : 20.4383861316667
TotalMilliseconds : 1226303.1679

Then with the indexing operator ([]):

measure-command { 1..19000 | % { $a[$_-1] } | out-null }

Result:

TotalMinutes      : 0.00788774666666667
TotalMilliseconds : 473.2648

The results are pretty striking, it takes nearly 2,600 times longer to use Select-Object.

A counting loop

The above is the single thing causing your major slowdown, but I wanted to point out something else.

Typically in most languages, you would use a for loop to count. In PowerShell this would look like this:

for ($i = 0; $i -lt $total ; $i++) {
    # $i has the value of the iteration
}

In short, there are three statements in the for loop. The first is an expression that gets run before the loop starts. $i = 0 initializes the iterator to 0, which is the typical usage of this first statement.

Next is a conditional; this will be tested on each iteration and the loop will continue if it returns true. Here $i -lt $total compares checks to see that $i is less than the value of $total, some other variable defined elsewhere, presumably the maximum value.

The last statement gets executed on each iteration of the loop. $i++ is the same as $i = $i + 1 so in this case we're incrementing $i on each iteration.

It's a bit more concise than using a do/until loop, and it's easier to follow because the meaning of a for loop is well known.

Other Notes

If you're interested in more feedback about working code you've written, have a look at Code Review. Please read the rules there carefully before posting.

like image 108
briantist Avatar answered Sep 28 '22 03:09

briantist


To my surprise using the array GetEnumerator is faster than indexing. It takes about 5/8 of the time of indexing. However this test is pretty unrealistic, in that the body of each loop is about as small as it can be.

$size = 64kb

$array = new int[] $size
# Initializing the array takes quite a bit of time compared to the loops below
0..($size-1) | % { $array[$_] = get-random}

write-host `n`nMeasure using indexing
[uint64]$sum = 0
Measure-Command {
  for ($ndx = 0; $ndx -lt $size; $ndx++) {
    $sum += $array[$ndx]
  }
}
write-host Average = ($sum / $size)

write-host `n`nMeasure using array enumerator
[uint64]$sum = 0
Measure-Command {
  foreach ($element in $array.GetEnumerator()) {
    $sum += $element
  }
}
write-host Average = ($sum / $size)



Measure using indexing


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 898
Ticks             : 8987213
TotalDays         : 1.04018668981481E-05
TotalHours        : 0.000249644805555556
TotalMinutes      : 0.0149786883333333
TotalSeconds      : 0.8987213
TotalMilliseconds : 898.7213

Average = 1070386366.9346


Measure using array enumerator
Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 559
Ticks             : 5597112
TotalDays         : 6.47813888888889E-06
TotalHours        : 0.000155475333333333
TotalMinutes      : 0.00932852
TotalSeconds      : 0.5597112
TotalMilliseconds : 559.7112

Average = 1070386366.9346

Code for these two in assembler might look like

;       Using Indexing
mov     esi, <addr of array>
xor     ebx, ebx
lea     edi, <addr of $sum>
loop:
mov     eax, dword ptr [esi][ebx*4]
add     dword ptr [edi], eax
inc     ebx
cmp     ebx, 65536
jl      loop

;       Using enumerator
mov     esi, <addr of array>
lea     edx, [esi + 65356*4]
lea     edi, <addr of $sum>
loop:
mov     eax, dword ptr [esi]
add     dword ptr [edi], eax
add     esi, 4
cmp     esi, edx
jl      loop

The only difference is in the first mov instruction in the loop, with one using an index register and the other not. I kind of doubt that would explain the observed difference in speed. I guess the JITter must add additional overhead.

like image 42
Χpẘ Avatar answered Sep 28 '22 03:09

Χpẘ