My neural network has the following architecture: <pre class="prettyprint"><code>input -> 128x (separate fully connected layers) -> output averaging </code></pre> I am using a ModuleList to hold the list of fully connected layers. Here's how it looks at this point: <pre class="prettyprint"><code>class MultiHead(nn.Module): def __init__(self, dim_state, dim_action, hidden_size=32, nb_heads=1): super(MultiHead, self).__init__() self.networks = nn.ModuleList() for _ in range(nb_heads): network = nn.Sequential( nn.Linear(dim_state, hidden_size), nn.Tanh(), nn.Linear(hidden_size, dim_action) ) self.networks.append(network) self.cuda() self.optimizer = optim.Adam(self.parameters()) </code></pre> Then, when I need to calculate the output, I use a <code>for ... in</code> construct to perform the forward and backward pass through all the layers: <pre class="prettyprint"><code>q_values = torch.cat([net(observations) for net in self.networks]) # skipped code which ultimately computes the loss I need self.optimizer.zero_grad() loss.backward() self.optimizer.step() </code></pre> This works! But I am wondering if I couldn't do this more efficiently. I feel like by doing a <code>for...in</code>, I am actually going through each separate FC layer one by one, while I'd expect this operation could be done in parallel.

In the case of <code>Convnd</code> in place of <code>Linear</code> you could use the <code>groups</code> argument for "grouped convolutions" (a.k.a. "depthwise convolutions"). This let's you handle all parallel networks simultaneously. If you use a convolution kernel of size <code>1</code>, then the convolution does nothing else than applying a <code>Linear</code> layer, where each channel is considered an input dimension. So the rough structure of your network would look like this: <ol> <li>Modify the input tensor of shape <code>B x dim_state</code> as follows: add an additional dimension and replicate by <code>nb_state</code>-times <code>B x dim_state</code> to <code>B x (dim_state * nb_heads) x 1</code> </li> <li>replace the two <code>Linear</code> with </li> </ol> <pre class="prettyprint"><code>nn.Conv1d(in_channels=dim_state * nb_heads, out_channels=hidden_size * nb_heads, kernel_size=1, groups=nb_heads) </code></pre> and <pre class="prettyprint"><code>nn.Conv1d(in_channels=hidden_size * nb_heads, out_channels=dim_action * nb_heads, kernel_size=1, groups=nb_heads) </code></pre> <ol start="4"> <li>we now have a tensor of size <code>B x (dim_action x nb_heads) x 1</code> you can now modify it to whatever shape you want (e.g. <code>B x nb_heads x dim_action</code>)</li> </ol> <hr> While CUDA natively supports grouped convolutions, there were some issues in pytorch with the speed of grouped convolutions (see e.g. here) but I think that was solved now.

Run multiple models of an ensemble in parallel with PyTorch

Tags:

python

deep-learning

pytorch

ensemble-learning

My neural network has the following architecture:

input -> 128x (separate fully connected layers) -> output averaging

I am using a ModuleList to hold the list of fully connected layers. Here's how it looks at this point:

class MultiHead(nn.Module):
    def __init__(self, dim_state, dim_action, hidden_size=32, nb_heads=1):
        super(MultiHead, self).__init__()

        self.networks = nn.ModuleList()
        for _ in range(nb_heads):
            network = nn.Sequential(
                nn.Linear(dim_state, hidden_size),
                nn.Tanh(),
                nn.Linear(hidden_size, dim_action)
            )
            self.networks.append(network)

        self.cuda()
        self.optimizer = optim.Adam(self.parameters())

Then, when I need to calculate the output, I use a for ... in construct to perform the forward and backward pass through all the layers:

q_values = torch.cat([net(observations) for net in self.networks])

# skipped code which ultimately computes the loss I need

self.optimizer.zero_grad()
loss.backward()
self.optimizer.step()

This works! But I am wondering if I couldn't do this more efficiently. I feel like by doing a for...in, I am actually going through each separate FC layer one by one, while I'd expect this operation could be done in parallel.

452

asked Oct 14 '19 10:10

MasterScrat

1 Answers

In the case of Convnd in place of Linear you could use the groups argument for "grouped convolutions" (a.k.a. "depthwise convolutions"). This let's you handle all parallel networks simultaneously.

If you use a convolution kernel of size 1, then the convolution does nothing else than applying a Linear layer, where each channel is considered an input dimension. So the rough structure of your network would look like this:

Modify the input tensor of shape B x dim_state as follows: add an additional dimension and replicate by nb_state-times B x dim_state to B x (dim_state * nb_heads) x 1
replace the two Linear with

nn.Conv1d(in_channels=dim_state * nb_heads, out_channels=hidden_size * nb_heads, kernel_size=1, groups=nb_heads)

and

nn.Conv1d(in_channels=hidden_size * nb_heads, out_channels=dim_action * nb_heads, kernel_size=1, groups=nb_heads)

we now have a tensor of size B x (dim_action x nb_heads) x 1 you can now modify it to whatever shape you want (e.g. B x nb_heads x dim_action)

While CUDA natively supports grouped convolutions, there were some issues in pytorch with the speed of grouped convolutions (see e.g. here) but I think that was solved now.

answered Oct 29 '22 18:10

flawr

Related questions
                            
                                How to emulate multiprocessing.Pool.map() in AWS Lambda?
                            
                                Why ColumnTransformer does not call fit on its transformers?
                            
                                Compare the previous N rows to the current row in a pandas column
                            
                                Get the file size of the uploaded file in Django app
                            
                                zip list elements in different dataframe columns
                            
                                PytestDeprecationWarning at test setup: the funcargnames attribute was an alias for fixturenames
                            
                                Performance issue while reading data from hive using python
                            
                                Jupyter "500: Internal Server Error"; "ImportError: cannot import name ConverterMapping"
                            
                                Default Adam optimizer doesn't work in tf.keras but string `adam` does
                            
                                How to check if celery task is already running before running it again with beat?
                            
                                Dividing a list of numbers in two groups such that numbers in one group don't have any factor common with the numbers in the other group
                            
                                How to look up identical column names in two dataframes and combine the matched columns
                            
                                Trouble parsing tabular items from a graph located in a website
                            
                                Pip won't install packages in virtualenv
                            
                                Running Python script from Azure WebJob
                            
                                I have some problem with my homework. It's about stop the loops
                            
                                Why does a pandas dataframe with sparse columns take up more memory?
                            
                                ImportError: cannot import name 'WebClient'
                            
                                How to optimize circle detection with Python OpenCV?
                            
                                Error: Could not locate a Flask application in VSCode

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With