Why are Embeddings in PyTorch implemented as Sparse Layers?

1 Answers

Upon closer inspection sparse gradients on Embeddings are optional and can be turned on or off with the sparse parameter:

class torch.nn.Embedding(num_embeddings, embedding_dim, padding_idx=None, max_norm=None, norm_type=2, scale_grad_by_freq=False, sparse=False)

Where:

sparse (boolean, optional) – if True, gradient w.r.t. weight matrix will be a sparse tensor. See Notes for more details regarding sparse gradients.

And the "Notes" mentioned are what I quoted in the question about a limited number of optimizers being supported for sparse gradients.

Update:

It is theoretically possible but technically difficult to implement some optimization methods on sparse gradients. There is an open issue in the PyTorch repo to add support for all optimizers.

Regarding the original question, I believe Embeddings can be treated as sparse because it is possible to operate on the input indices directly rather than converting them to one-hot encodings for input into a dense layer. This is explained in @Maxim's answer to my related question.

113

answered Oct 02 '22 19:10

Imran

Related questions
                            
                                How does a Neural Network "remember" what its learned?
                            
                                Custom Hebbian Layer Implementation in Keras - input/output dims and lateral node connections
                            
                                What is the difference between conv1d with kernel_size=1 and dense layer?
                            
                                How to see the loss of the best epoch from early stopping in Keras?
                            
                                Validation dataset in PyTorch using DataLoaders
                            
                                InvalidArgumentError: required broadcastable shapes at loc(unknown)
                            
                                Effects of randomizing the order of inputs to a neural network
                            
                                Long term prediction using Artificial Neural Network
                            
                                Does this neural network model exist?
                            
                                How fast are Deep Learning techniques (DNN, DBN, ...) in practice ? [closed]
                            
                                Are there standard input, weight and output values for neural network nodes? [closed]
                            
                                Using machine learning to make a computer learn calculus
                            
                                Where is layer module defined in PyCaffe
                            
                                Using neural networks to estimate distance in an image
                            
                                How to get both score and accuracy after training
                            
                                Tensorflow LSTM RNN output activation function
                            
                                How to implement weight decay in tensorflow as in Caffe
                            
                                Using tensorflow models in web applications
                            
                                Keras TimeDistributed - are weights shared?
                            
                                Keras Training warm_start

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why are Embeddings in PyTorch implemented as Sparse Layers?

Tags:

neural-network

deep-learning

pytorch

Imran

People also ask

1 Answers

Imran

Recent Activity

Donate For Us