Creating one hot vector from indices given as a tensor

Tags:

pytorch

I have a tensor of size 4 x 6 where 4 is batch size and 6 is sequence length. Every element of the sequence vectors are some index (0 to n). I want to create a 4 x 6 x n tensor where the vectors in 3rd dimension will be one hot encoding of the index which means I want to put 1 in the specified index and rest of the values will be zero.

For example, I have the following tensor:

[[5, 3, 2, 11, 15, 15],
[1, 4, 6, 7, 3, 3],
[2, 4, 7, 8, 9, 10],
[11, 12, 15, 2, 5, 7]]

Here, all the values are in between (0 to n) where n = 15. So, I want to convert the tensor to a 4 X 6 X 16 tensor where the third dimension will represent one hot encoding vector.

How can I do that using PyTorch functionalities? Right now, I am doing this with loop but I want to avoid looping!

803

asked Jun 09 '17 15:06

Wasi Ahmad

1 Answers

NEW ANSWER As of PyTorch 1.1, there is a one_hot function in torch.nn.functional. Given any tensor of indices indices and a maximal index n, you can create a one_hot version as follows:

n = 5
indices = torch.randint(0,n, size=(4,7))
one_hot = torch.nn.functional.one_hot(indices, n) # size=(4,7,n)

Very old Answer

At the moment, slicing and indexing can be a bit of a pain in PyTorch from my experience. I assume you don't want to convert your tensors to numpy arrays. The most elegant way I can think of at the moment is to use sparse tensors and then convert to a dense tensor. That would work as follows:

from torch.sparse import FloatTensor as STensor

batch_size = 4
seq_length = 6
feat_dim = 16

batch_idx = torch.LongTensor([i for i in range(batch_size) for s in range(seq_length)])
seq_idx = torch.LongTensor(list(range(seq_length))*batch_size)
feat_idx = torch.LongTensor([[5, 3, 2, 11, 15, 15], [1, 4, 6, 7, 3, 3],                            
                             [2, 4, 7, 8, 9, 10], [11, 12, 15, 2, 5, 7]]).view(24,)

my_stack = torch.stack([batch_idx, seq_idx, feat_idx]) # indices must be nDim * nEntries
my_final_array = STensor(my_stack, torch.ones(batch_size * seq_length), 
                         torch.Size([batch_size, seq_length, feat_dim])).to_dense()    

print(my_final_array)

Note: PyTorch is undergoing some work currently, that will add numpy style broadcasting and other functionalities within the next two or three weeks and other functionalities. So it's possible, there'll be better solutions available in the near future.

Hope this helps you a bit.

154

answered Nov 10 '22 00:11

mbpaulus

Related questions
                            
                                PyTorch set_grad_enabled(False) vs with no_grad():
                            
                                Can not get pytorch working with tensorboard
                            
                                How to iterate over layers in Pytorch
                            
                                Pytorch: Convert FloatTensor into DoubleTensor
                            
                                TypeError: tensor is not a torch image
                            
                                How to disable TOKENIZERS_PARALLELISM=(true | false) warning?
                            
                                How to get an output dimension for each layer of the Neural Network in Pytorch?
                            
                                RuntimeError: "exp" not implemented for 'torch.LongTensor'
                            
                                How can I install torchtext?
                            
                                How to get the device type of a pytorch module conveniently?
                            
                                Running LSTM with multiple GPUs gets "Input and hidden tensors are not at the same device"
                            
                                pytorch, AttributeError: module 'torch' has no attribute 'Tensor'
                            
                                How do you convert a .onnx to tflite?
                            
                                Parallelization strategies for deep learning
                            
                                PyTorch: What's the difference between state_dict and parameters()?
                            
                                pytorch RuntimeError: Expected object of scalar type Double but got scalar type Float
                            
                                Pytorch: Image label
                            
                                Filter data in pytorch tensor
                            
                                how to flatten input in `nn.Sequential` in Pytorch
                            
                                Random Choice with Pytorch?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With