Problem: I have huge amount of sql queries (around 10k-20k) and I want to run them asynchronous in 50 (or more) threads.
I wrote a powershell script for this job, but it is very slow (It took about 20 hours to execute all). Desired result is 3-4 hours max.
Question: How can I optimize this powershell script? Should I reconsider and use another technology like python
or c#
?
I think it's powershell issue, because when I check with whoisactive
the queries are executing fast. Creating, exiting and unloading jobs takes a lot of time, because for each thread is created separate PS instances.
My code:
$NumberOfParallerThreads = 50;
$Arr_AllQueries = @('Exec [mystoredproc] @param1=1, @param2=2',
'Exec [mystoredproc] @param1=11, @param2=22',
'Exec [mystoredproc] @param1=111, @param2=222')
#Creating the batches
$counter = [pscustomobject] @{ Value = 0 };
$Batches_AllQueries = $Arr_AllQueries | Group-Object -Property {
[math]::Floor($counter.Value++ / $NumberOfParallerThreads)
};
forEach ($item in $Batches_AllQueries) {
$tmpBatch = $item.Group;
$tmpBatch | % {
$ScriptBlock = {
# accept the loop variable across the job-context barrier
param($query)
# Execute a command
Try
{
Write-Host "[processing '$query']"
$objConnection = New-Object System.Data.SqlClient.SqlConnection;
$objConnection.ConnectionString = 'Data Source=...';
$ObjCmd = New-Object System.Data.SqlClient.SqlCommand;
$ObjCmd.CommandText = $query;
$ObjCmd.Connection = $objConnection;
$ObjCmd.CommandTimeout = 0;
$objAdapter = New-Object System.Data.SqlClient.SqlDataAdapter;
$objAdapter.SelectCommand = $ObjCmd;
$objDataTable = New-Object System.Data.DataTable;
$objAdapter.Fill($objDataTable) | Out-Null;
$objConnection.Close();
$objConnection = $null;
}
Catch
{
$ErrorMessage = $_.Exception.Message
$FailedItem = $_.Exception.ItemName
Write-Host "[Error processing: $($query)]" -BackgroundColor Red;
Write-Host $ErrorMessage
}
}
# pass the loop variable across the job-context barrier
Start-Job $ScriptBlock -ArgumentList $_ | Out-Null
}
# Wait for all to complete
While (Get-Job -State "Running") { Start-Sleep 2 }
# Display output from all jobs
Get-Job | Receive-Job | Out-Null
# Cleanup
Remove-Job *
}
UPDATE:
Resources: The DB server is on a remote machine with:
We want to use the maximum cpu power.
Framework limitation: The only limitation is not to use SQL Server to execute the queries. The requests should come from outside source like: Powershell, C#, Python, etc.
SQL Server allows applications to perform asynchronous database operations. Asynchronous processing enables methods to return immediately without blocking on the calling thread.
RunspacePool is the way to go here, try this:
$AllQueries = @( ... )
$MaxThreads = 5
# Each thread keeps its own connection but shares the query queue
$ScriptBlock = {
Param($WorkQueue)
$objConnection = New-Object System.Data.SqlClient.SqlConnection
$objConnection.ConnectionString = 'Data Source=...'
$objCmd = New-Object System.Data.SqlClient.SqlCommand
$objCmd.Connection = $objConnection
$objCmd.CommandTimeout = 0
$query = ""
while ($WorkQueue.TryDequeue([ref]$query)) {
$objCmd.CommandText = $query
$objAdapter = New-Object System.Data.SqlClient.SqlDataAdapter $objCmd
$objDataTable = New-Object System.Data.DataTable
$objAdapter.Fill($objDataTable) | Out-Null
}
$objConnection.Close()
}
# create a pool
$pool = [RunspaceFactory]::CreateRunspacePool(1, $MaxThreads)
$pool.ApartmentState = 'STA'
$pool.Open()
# convert the query array into a concurrent queue
$workQueue = New-Object System.Collections.Concurrent.ConcurrentQueue[object]
$AllQueries | % { $workQueue.Enqueue($_) }
$threads = @()
# Create each powershell thread and add them to the pool
1..$MaxThreads | % {
$ps = [powershell]::Create()
$ps.RunspacePool = $pool
$ps.AddScript($ScriptBlock) | Out-Null
$ps.AddParameter('WorkQueue', $workQueue) | Out-Null
$threads += [pscustomobject]@{
Ps = $ps
Handle = $null
}
}
# Start all the threads
$threads | % { $_.Handle = $_.Ps.BeginInvoke() }
# Wait for all the threads to complete - errors will still set the IsCompleted flag
while ($threads | ? { !$_.Handle.IsCompleted }) {
Start-Sleep -Seconds 1
}
# Get any results and display an errors
$threads | % {
$_.Ps.EndInvoke($_.Handle) | Write-Output
if ($_.Ps.HadErrors) {
$_.Ps.Streams.Error.ReadAll() | Write-Error
}
}
Unlike powershell jobs, a RunspacePools can share resources. So there is one concurrent queue of all the queries, and each thread keeps its own connection to the database.
As others have said though - unless you're stress testing your database, you're probably better off reorganising the queries into bulk inserts.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With