Given a tensor: <pre class="prettyprint"><code>A = torch.tensor([2., 3., 4., 5., 6., 7.]) </code></pre> Then, give each element in <code>A</code> an id: <pre class="prettyprint"><code>id = torch.arange(A.shape[0], dtype = torch.int) # tensor([0,1,2,3,4,5]) </code></pre> In other words, id of <code>2.</code> in <code>A</code> is 0 and id of <code>3.</code> in <code>A</code> is 1: <pre class="prettyprint"><code>2. -> 0 3. -> 1 4. -> 2 5. -> 3 6. -> 4 7. -> 5 </code></pre> Then, I have a new tensor: <pre class="prettyprint"><code>B = torch.tensor([3., 6., 6., 5., 4., 4., 4.]) </code></pre> In pytorch, is there any way in Pytorch to map each element in B to id? In other words, I want to obtain <code>tensor([1, 4, 4, 3, 2, 2, 2])</code>, in which each element is id of the element in <code>B</code>.

What you ask can be done with slowly iterating the whole <code>B</code> matrix and checking each element of it against all elements of <code>A</code> and then retrieving the index of each element: <pre class="prettyprint"><code>In [*]: for x in B: ...: print(torch.where(x==A)[0][0]) ...: ...: tensor(1) tensor(4) tensor(4) tensor(3) tensor(2) tensor(2) tensor(2) </code></pre> Here I used <code>torch.where</code> to find all the True elements in the matrix <code>x==A</code>, where <code>x</code> take the value of each element of matrix <code>B</code>. This is really slow but it allows you to add some functionality to deal with cases where some elements of <code>B</code> do not appear in matrix <code>A</code> The fast and dirty method to get what you want with linear algebra operations is: <pre class="prettyprint"><code>In [*]: (B.view(-1,1) == A).int().argmax(dim=1) Out[*]: tensor([1, 4, 4, 3, 2, 2, 2]) </code></pre> This trick takes advantage of the fact that <code>argmax</code> returns the first 'max' index of each vector in <code>dim=1</code>. Big warning here, if the element does not exist in the matrix no error will be raised and the result will silently be <code>0</code> for all elements that do not exist in <code>A</code>. <pre class="prettyprint"><code>In [*]: C = torch.tensor([100, 1000, 1, 3, 9999]) In [*]: (C.view(-1,1) == A).int().argmax(dim=1) Out[*]: tensor([0, 0, 0, 1, 0]) </code></pre>

How to map element in pytorch tensor to id?

Tags:

python-3.x

pytorch

Given a tensor:

A = torch.tensor([2., 3., 4., 5., 6., 7.])

Then, give each element in A an id:

id = torch.arange(A.shape[0], dtype = torch.int)   # tensor([0,1,2,3,4,5])

In other words, id of 2. in A is 0 and id of 3. in A is 1:

2. -> 0
3. -> 1
4. -> 2
5. -> 3
6. -> 4
7. -> 5

Then, I have a new tensor:

B = torch.tensor([3., 6., 6., 5., 4., 4., 4.])

In pytorch, is there any way in Pytorch to map each element in B to id? In other words, I want to obtain tensor([1, 4, 4, 3, 2, 2, 2]), in which each element is id of the element in B.

489

asked Jan 04 '21 15:01

MarioKZZ

3 Answers

What you ask can be done with slowly iterating the whole B matrix and checking each element of it against all elements of A and then retrieving the index of each element:

In [*]: for x in B:
    ...:     print(torch.where(x==A)[0][0])
    ...:
    ...:
tensor(1)
tensor(4)
tensor(4)
tensor(3)
tensor(2)
tensor(2)
tensor(2)

Here I used torch.where to find all the True elements in the matrix x==A, where x take the value of each element of matrix B. This is really slow but it allows you to add some functionality to deal with cases where some elements of B do not appear in matrix A

The fast and dirty method to get what you want with linear algebra operations is:

In [*]: (B.view(-1,1) == A).int().argmax(dim=1)
Out[*]: tensor([1, 4, 4, 3, 2, 2, 2])

This trick takes advantage of the fact that argmax returns the first 'max' index of each vector in dim=1.

Big warning here, if the element does not exist in the matrix no error will be raised and the result will silently be 0 for all elements that do not exist in A.

In [*]: C = torch.tensor([100, 1000, 1, 3, 9999])

In [*]: (C.view(-1,1) == A).int().argmax(dim=1)
Out[*]: tensor([0, 0, 0, 1, 0])

171

answered Oct 27 '22 20:10

Makis Tsantekidis

I don't think there is such a function in PyTorch to map a tensor.

It seems quite unreasonable to solve this by comparing each value from B to values from B.

Here are two possible solutions to solve this problem.

Using a dictionary as a map

You can use a dictionary. Not so not much of a pure-PyTorch solution but will most probably be the fastest and safest way...

Just create a dict to map each element to an id, then use it to map B:

>>> map = {x.item(): i for i, x in enumerate(A)}

>>> torch.tensor([map[x.item()] for x in B])
tensor([1, 4, 4, 3, 2, 2, 2])

Change of basis approach

An alternative only using torch.Tensors. This will require the values you want to map - the content of A - to be integers because they will be used to index a tensor.

Encode the content of A into one-hot encodings:

>>> A_enc = torch.zeros((int(A.max())+1,)*2)
>>> A_enc[A, torch.arange(A.shape[0])] = 1

>>> A_enc
tensor([[0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0., 0., 0.],
        [0., 0., 0., 1., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 0., 0., 0.],
        [0., 0., 0., 0., 0., 1., 0., 0.]])

We'll use A_enc as our basis to map integers:

>>> v = torch.argmax(A_enc, dim=0)
tensor([0, 0, 0, 1, 2, 3, 4, 5])

Now, given an integer for instance x=3, we can encode it into a one-hot-encoding: x_enc = [0, 0, 0, 1, 0, 0, 0, 0]. Then, use v to map it. With a simple dot product you can get the mapping of x_enc: here <v/x_enc> gives 1 which is the desired result (first element of mapped-B). But instead of giving x_enc, we will compute the matrix multiplication between v and encoded-B. First encode B then compute the matrix multiplcition vxB_enc:

>>> B_enc = torch.zeros(A_enc.shape[0], B.shape[0])
>>> B_enc[B, torch.arange(B.shape[0])] = 1

>>> B_enc
tensor([[0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 1., 1.],
        [0., 0., 0., 1., 0., 0., 0.],
        [0., 1., 1., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.]])

>>> v@B_enc.long()
tensor([1, 4, 4, 3, 2, 2, 2])

Note - you will have to define your tensors with Long type.

answered Oct 27 '22 18:10

Ivan

There is a similar issue for numpy so my answer is heavily inspired by their solution. I will compare some of the mentioned methods using perfplot. I will also generalize the problem to apply a mapping to a tensor (yours is just a specific case).

For the analysis, I will assume the mapping contains all the unique elements in the tensor and the number of elements to small and constant.

import torch


def apply(a: torch.Tensor, ids: torch.Tensor, b: torch.Tensor) -> torch.Tensor:
    mapping = {k.item(): v.item() for k, v in zip(a, ids)}
    return b.clone().apply_(lambda x: mapping.__getitem__(x))


def bucketize(a: torch.Tensor, ids: torch.Tensor, b: torch.Tensor) -> torch.Tensor:
    mapping = {k.item(): v.item() for k, v in zip(a, ids)}

    # From `https://stackoverflow.com/questions/13572448`.
    palette, key = zip(*mapping.items())
    key = torch.tensor(key)
    palette = torch.tensor(palette)

    index = torch.bucketize(b.ravel(), palette)
    remapped = key[index].reshape(b.shape)

    return remapped


def iterate(a: torch.Tensor, ids: torch.Tensor, b: torch.Tensor) -> torch.Tensor:
    mapping = {k.item(): v.item() for k, v in zip(a, ids)}
    return torch.tensor([mapping[x.item()] for x in b])


def argmax(a: torch.Tensor, ids: torch.Tensor, b: torch.Tensor) -> torch.Tensor:
    return (b.view(-1, 1) == a).int().argmax(dim=1)


if __name__ == "__main__":
    import perfplot

    a = torch.arange(2, 8)
    ids = torch.arange(0, 6)

    perfplot.show(
        setup=lambda n: torch.randint(2, 8, (n,)),
        kernels=[
            lambda x: apply(a, ids, x),
            lambda x: bucketize(a, ids, x),
            lambda x: iterate(a, ids, x),
            lambda x: argmax(a, ids, x),
        ],
        labels=["apply", "bucketize", "iterate", "argmax"],
        n_range=[2 ** k for k in range(25)],
        xlabel="len(a)",
    )

Running this yields the following plot: Speed against array size

Hence depending on the number of elements in your tensor you can pick either the argmax method (with the caveats mentioned and the restriction that you have to map the values from 0 to N), apply, or bucketize.

Now if we increase the number of elements to be mapped lets say tens of thousands i.e. a = torch.arange(2, 10002) and ids = torch.arange(0, 10000) we get the following results:

Speed against array size with more elements in mapping

This means the speed increase of bucketize will only be visible for a larger array but still outperforms the other methods (the argmax method was killed and therefore I had to remove it).

Last, if we have a mapping that does not have all keys present in the tensor we can just update a dictionary with all unique keys:

mapping = {x.item(): x.item() for x in torch.unique(a)}
mapping.update({k.item(): v.item() for k, v in zip(a, ids)})

Now, if the unique elements you want to map is orders of magnitude larger than the array computing this may shift the value of n for when bucketize is faster than apply (since for apply you can change the mapping.__getitem__(x) for mapping.get(x, x).

answered Oct 27 '22 19:10

Ramon

Related questions
                            
                                Floor division with small numbers returning wrong answer [duplicate]
                            
                                How to plot charts side by side with a forloop
                            
                                AttributeError: cffi library '(pyModulesPath)\_soundfile_data\libsndfile64bit.dll' has no function, constant or global variable named 'sf_wchar_open'
                            
                                python KeyError: 'sapi5'
                            
                                How to extract multiple numbers from Pandas Dataframe
                            
                                Pydantic: How do I use a keyword field name?
                            
                                How to avoid conda activate base from automatically executing in my VS Code editor?
                            
                                Librosa raised OSError('sndfile library not found') in Docker
                            
                                Discord.py - how to detect if a user mentions/pings the bot
                            
                                Python unittest setting a global variable correctly
                            
                                ImportError: cannot import name 'force_unicode' caused another exception
                            
                                Is there a way in Python to ensure that one argument of my function is another function? [duplicate]
                            
                                How can i remove strings from sentences if string matches with strings in list
                            
                                DataFrame columns sort by a given list and add empty columns for missing columns
                            
                                Function call stack: train_function
                            
                                KeyError on If-Condition in dictionary Python
                            
                                fill values after condition with NaN
                            
                                AttributeError: 'NoneType' object has no attribute 'excluded_of'
                            
                                How to view opts for Holoviews with Bokeh in Python
                            
                                Sum negative row values with previous rows pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With