Problem: I have huge amount of sql queries (around 10k-20k) and I want to run them asynchronous in 50 (or more) threads.
I wrote a powershell script for this job, but it is very slow (It took about 20 hours to execute all). Desired result is 3-4 hours max.
Question: How can I optimize this powershell script? Should I reconsider and use another technology like python or c#?
I think it's powershell issue, because when I check with whoisactive the queries are executing fast. Creating, exiting and unloading jobs takes a lot of time, because for each thread is created separate PS instances.
My code:
$NumberOfParallerThreads = 50;
$Arr_AllQueries = @('Exec [mystoredproc] @param1=1, @param2=2',
                    'Exec [mystoredproc] @param1=11, @param2=22',
                    'Exec [mystoredproc] @param1=111, @param2=222')
#Creating the batches
$counter = [pscustomobject] @{ Value = 0 };
$Batches_AllQueries = $Arr_AllQueries | Group-Object -Property { 
    [math]::Floor($counter.Value++ / $NumberOfParallerThreads) 
};
forEach ($item in $Batches_AllQueries) {
    $tmpBatch = $item.Group;
    $tmpBatch | % {
        $ScriptBlock = {
            # accept the loop variable across the job-context barrier
            param($query) 
            # Execute a command
            Try 
            {
                Write-Host "[processing '$query']"
                $objConnection = New-Object System.Data.SqlClient.SqlConnection;
                $objConnection.ConnectionString = 'Data Source=...';
                $ObjCmd = New-Object System.Data.SqlClient.SqlCommand;
                $ObjCmd.CommandText = $query;
                $ObjCmd.Connection = $objConnection;
                $ObjCmd.CommandTimeout = 0;
                $objAdapter = New-Object System.Data.SqlClient.SqlDataAdapter;
                $objAdapter.SelectCommand = $ObjCmd;
                $objDataTable = New-Object System.Data.DataTable;
                $objAdapter.Fill($objDataTable)  | Out-Null;
                $objConnection.Close();
                $objConnection = $null;
            } 
            Catch 
            { 
                $ErrorMessage = $_.Exception.Message
                $FailedItem = $_.Exception.ItemName
                Write-Host "[Error processing: $($query)]" -BackgroundColor Red;
                Write-Host $ErrorMessage 
            }
        }
        # pass the loop variable across the job-context barrier
        Start-Job $ScriptBlock -ArgumentList $_ | Out-Null
    }
    # Wait for all to complete
    While (Get-Job -State "Running") { Start-Sleep 2 }
    # Display output from all jobs
    Get-Job | Receive-Job | Out-Null
    # Cleanup
    Remove-Job *
}
UPDATE:
Resources: The DB server is on a remote machine with:
We want to use the maximum cpu power.
Framework limitation: The only limitation is not to use SQL Server to execute the queries. The requests should come from outside source like: Powershell, C#, Python, etc.
SQL Server allows applications to perform asynchronous database operations. Asynchronous processing enables methods to return immediately without blocking on the calling thread.
RunspacePool is the way to go here, try this:
$AllQueries = @( ... )
$MaxThreads = 5
# Each thread keeps its own connection but shares the query queue
$ScriptBlock = {
    Param($WorkQueue)
    $objConnection = New-Object System.Data.SqlClient.SqlConnection
    $objConnection.ConnectionString = 'Data Source=...'
    $objCmd = New-Object System.Data.SqlClient.SqlCommand
    $objCmd.Connection = $objConnection
    $objCmd.CommandTimeout = 0
    $query = ""
    while ($WorkQueue.TryDequeue([ref]$query)) {
        $objCmd.CommandText = $query
        $objAdapter = New-Object System.Data.SqlClient.SqlDataAdapter $objCmd
        $objDataTable = New-Object System.Data.DataTable
        $objAdapter.Fill($objDataTable) | Out-Null
    }
    $objConnection.Close()
}
# create a pool
$pool = [RunspaceFactory]::CreateRunspacePool(1, $MaxThreads)
$pool.ApartmentState  = 'STA'
$pool.Open()
# convert the query array into a concurrent queue
$workQueue = New-Object System.Collections.Concurrent.ConcurrentQueue[object]
$AllQueries | % { $workQueue.Enqueue($_) }
$threads = @()
# Create each powershell thread and add them to the pool
1..$MaxThreads | % {
    $ps = [powershell]::Create()
    $ps.RunspacePool = $pool
    $ps.AddScript($ScriptBlock) | Out-Null
    $ps.AddParameter('WorkQueue', $workQueue) | Out-Null
    $threads += [pscustomobject]@{
        Ps = $ps
        Handle = $null
    }
}
# Start all the threads
$threads | % { $_.Handle = $_.Ps.BeginInvoke() }
# Wait for all the threads to complete - errors will still set the IsCompleted flag
while ($threads | ? { !$_.Handle.IsCompleted }) {
    Start-Sleep -Seconds 1
}
# Get any results and display an errors
$threads | % {
    $_.Ps.EndInvoke($_.Handle) | Write-Output
    if ($_.Ps.HadErrors) {
        $_.Ps.Streams.Error.ReadAll() | Write-Error
    }
}
Unlike powershell jobs, a RunspacePools can share resources. So there is one concurrent queue of all the queries, and each thread keeps its own connection to the database.
As others have said though - unless you're stress testing your database, you're probably better off reorganising the queries into bulk inserts.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With