Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Processing batch items in parallel

Tags:

msbuild

I have an ItemGroup, and want to process all its items in parallel (using a custom task or an .exe).

  • I could write my task/exe to accept the entire ItemGroup and process its items in parallel internally. However, I want this parallelism to work in conjunction with MSBuild's /maxCpuCount param, since otherwise I might end up over-parallelizing.
  • This thread says there's no way.
  • My testing shows that MSBuild's /maxCpuCount only works for building different projects, not items (see code below)

How can I process items from an ItemGroup in parallel?
Is there a way to author a custom task to work in parallel in conjunction with MSBuild's Parallel support?

<Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
  <Target Name="Build" >
    <!-- Runs only once - I guess MSBuild detects it's the same project -->
    <!--<MSBuild Projects="$(MSBuildProjectFullPath);$(MSBuildProjectFullPath)" Targets="Wait3000" BuildInParallel="true" />-->

    <!-- Runs in parallel!. Note that b.targets is a copy of the original a.targets -->
    <MSBuild Projects="$(MSBuildProjectFullPath);b.targets" Targets="Wait3000" BuildInParallel="true" />

    <!-- Runs sequentially -->
    <ItemGroup>
      <Waits Include="3000;2000"/>
    </ItemGroup>
    <Wait DurationMs="%(Waits.Identity)" />
  </Target>

  <Target Name="Wait3000">
    <Wait DurationMs="3000" />
  </Target>

  <UsingTask TaskName="Wait" TaskFactory="CodeTaskFactory" AssemblyFile="$(MSBuildToolsPath)\Microsoft.Build.Tasks.v4.0.dll" >
    <ParameterGroup>
      <DurationMs ParameterType="System.Int32" Required="true" />
    </ParameterGroup>
    <Task>
      <Code Type="Fragment" Language="cs">
        Log.LogMessage(string.Format("{0:HH\\:mm\\:ss\\:fff}  Start  DurationMs={1}", DateTime.Now, DurationMs), MessageImportance.High);
        System.Threading.Thread.Sleep(DurationMs);
        Log.LogMessage(string.Format("{0:HH\\:mm\\:ss\\:fff}  End    DurationMs={1}", DateTime.Now, DurationMs), MessageImportance.High);
      </Code>
    </Task>
  </UsingTask>
</Project>   
like image 288
Jonathan Avatar asked Aug 18 '14 10:08

Jonathan


People also ask

What is parallel batch processing?

Multiple jobs are processed simultaneously on a given batch processing machine in parallel batching. The resulting batch is called a p-batch. Batching can lead to reduced production costs, but depending how the jobs are grouped into a batch can lead to better or worse delivery times of products.

Is batch processing and parallel processing same?

You call that batch processing beacuse you are not taking part on it, you just run the script and forget about it until it's finished. The same way, you call that parallel processing because you have 20 processes going on at the same time (on different computers) regardless they need interaction or not.

What is the difference between parallel foreach and batch processing?

Memory footprint: Since you said you have millions of records to process, parallel for each will aggregate all the processed records at the end and can possibly cause Out Of Memory. Batch job instead provides a BatchResult in the on complete phase where you can get the count of failures and success.

What are the two stages in batch processing?

Each batch job contains three different phases: Load and Dispatch. Process. On Complete.


1 Answers

I know this is old, but if you get a few minutes, revisit your attempt to use the MSBuild task. Using the Properties and/or AdditionalProperties reserved item metadata elements* will resolve the issue you described in your code sample ("Runs only once - I guess MSBuild detects it's the same project").

The MSBuild file below processes items from an ItemGroup in parallel via MSBuild's parallel support (including /maxCpuCount). It does not use BuildTargetsInParallel from the MSBuild Extension Pack, nor any other custom or inline task.

<Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">

  <Target Name="Build" >
    <ItemGroup>
      <Waits Include="3000;2000"/>
    </ItemGroup>

    <ItemGroup>
      <ProjectItems Include="$(MSBuildProjectFullPath)">
        <Properties>
          WaitMs=%(Waits.Identity)
        </Properties>
      </ProjectItems>
    </ItemGroup>
    <MSBuild Projects="@(ProjectItems)" Targets="WaitSpecifiedMs" BuildInParallel="true" />
  </Target>

  <Target Name="WaitSpecifiedMs">
    <Wait DurationMs="$(WaitMs)" />
  </Target>

</Project>

* Well-hidden under "Properties Metadata" on the MSBuild Task reference page.

like image 105
weir Avatar answered Sep 29 '22 00:09

weir