I'm trying to implement a basic pipelined model using Graphcore's PopART framework (part of the Poplar API) to speed up my model which is split over multiple processors.
I'm following their example code, but I notice the example does not use the pipelineStage()
call, which is used in some of their other applications (namely Bert), and instead uses virtualGraph()
to define the processor the operations should run on.
A small snippet of the example below:
# Dense 1
W0 = builder.addInitializedInputTensor(
init_weights(num_features, 512))
b0 = builder.addInitializedInputTensor(init_biases(512))
with builder.virtualGraph(0):
x1 = builder.aiOnnx.gemm([x0, W0, b0], debugPrefix="gemm_x1")
x2 = builder.aiOnnx.relu([x1], debugPrefix="relu_x2")
# Dense 2
W1 = builder.addInitializedInputTensor(init_weights(512, num_classes))
b1 = builder.addInitializedInputTensor(init_biases(num_classes))
with builder.virtualGraph(1):
x3 = builder.aiOnnx.gemm([x2, W1, b1], debugPrefix="gemm_x3")
x4 = builder.aiOnnx.relu([x3], debugPrefix="relu_x4")
Conversely, the Bert example seems to create a context that combines virtualGraph()
with pipelineStage()
:
self.stack.enter_context(self.builder.pipelineStage(self.pipelineStage))
I'm not sure which should be the preferred style. Are there any implications to only using virtualGraph()
?
virtualGraph
and pipelineStage
are two different concepts in Graphcore PopART framework, though they also are related.
virtualGraph
(see the Setting the IPU number for operations section of the PopART User Guide and the PopART C++ API for reference) enables splitting a graph into multiple parts, to run on multiple IPUs. Using virtualGraph
on its own, as shown in the code example you referred to, means running sequentially on the allocated parts of the model.
On the other hand, pipelineStage
allows you to cut your graph into several stages that can, when possible, be run in parallel on different IPUs. You have the flexibility to choose which ops should be placed in each pipeline stage. Pipelining is enabled by the option opts.enablePipelining
(see the PopART C++ API).
By default, pipelining implicitly creates one pipelineStage
per virtualGraph
.
However, it is possible to specify more than one pipelineStage
for a single virtualGraph
. This could be useful when two parts of a model share the same, large, data set. That’s why a combination of pipelineStage
and virtualGraph
is used in the BERT model: BERT has a large embeddings matrix, which is used at the start and at the end of the model. It’s possible to place both operations in the same virtualGraph
to be located on the same IPU, so that the shared data is not copied to multiple IPUs, but remain available at different pipeline stages - as they are not executed sequentially.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With