Can I use more than 26 letters in `numpy.einsum`?

Tags:

I am using np.einsum to multiply probability tables like:

np.einsum('ijk,jklm->ijklm', A, B)

The issue is that I am dealing with more than 26 random variables (axes) overall, so if I assign each random variable a letter I run out of letters. Is there another way I can specify the above operation to avoid this issue, without resorting to a mess of np.sum and np.dot operations?

677

asked Jun 13 '16 15:06

akxlr

2 Answers

The short answer is, you can use any of the 52 letters (upper and lower). That's all the letters in the English language. Any fancier axes names will have to be mapped on those 52, or an equivalent set of numbers. Practically speaking you will want to use a fraction of those 52 in any one einsum call.

@kennytm suggests using the alternative input syntax. A few sample runs suggests that this is not a solution. 26 is still the practical limit (despite the suspicious error messages).

In [258]: np.einsum(np.ones((2,3)),[0,20],np.ones((3,4)),[20,2],[0,2])
Out[258]: 
array([[ 3.,  3.,  3.,  3.],
       [ 3.,  3.,  3.,  3.]])

In [259]: np.einsum(np.ones((2,3)),[0,27],np.ones((3,4)),[27,2],[0,2])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-259-ea61c9e50d6a> in <module>()
----> 1 np.einsum(np.ones((2,3)),[0,27],np.ones((3,4)),[27,2],[0,2])

ValueError: invalid subscript '|' in einstein sum subscripts string, subscripts must be letters

In [260]: np.einsum(np.ones((2,3)),[0,100],np.ones((3,4)),[100,2],[0,2])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-260-ebd9b4889388> in <module>()
----> 1 np.einsum(np.ones((2,3)),[0,100],np.ones((3,4)),[100,2],[0,2])

ValueError: subscript is not within the valid range [0, 52]

I'm not entirely sure why you need more than 52 letters (upper and lower case), but I'm sure you need to do some sort of mapping. You don't want to write an einsum string using more than 52 axes all at once. The resulting iterator would be too large (for memory or time).

I'm picturing some sort of mapping function that can be used as:

 astr = foo(A.names, B.names)
 # foo(['i','j','k'],['j','k','l','m'])
 # foo(['a1','a2','a3'],['a2','a3','b4','b5'])
 np.einsum(astr, A, B)

https://github.com/hpaulj/numpy-einsum/blob/master/einsum_py.py

is a Python version of einsum. Crudely speaking einsum parses the subscripts string, creating an op_axes list that can be used in np.nditer to set up the required sum-of-products calculation. With this code I can look at how the translation is done:

From an example in the __name__ block:

    label_str, op_axes = parse_subscripts('ik,kj->ij',    Labels([A.ndim,B.ndim]))
    print op_axes
    # [[0, -1, 1], [-1, 1, 0], [0, 1, -1]]  fine
    # map (4,newaxis,3)(newaxis,3,2)->(4,2,newaxis)
    print sum_of_prod([A,B],op_axes)

Your example, with full diagnostic output is

In [275]:  einsum_py.parse_subscripts('ijk,jklm->ijklm',einsum_py.Labels([3,4])) 
jklm
{'counts': {105: 1, 106: 2, 107: 2, 108: 1, 109: 1}, 
 'strides': [], 
 'num_labels': 5, 
 'min_label': 105, 
 'nop': 2, 
 'ndims': [3, 4], 
 'ndim_broadcast': 0, 
 'shapes': [], 
 'max_label': 109}
[('ijk', [105, 106, 107], 'NONE'), 
 ('jklm', [106, 107, 108, 109], 'NONE')]
 ('ijklm', [105, 106, 107, 108, 109], 'NONE')
iter labels: [105, 106, 107, 108, 109],'ijklm'
op_axes [[0, 1, 2, -1, -1], [-1, 0, 1, 2, 3], [0, 1, 2, 3, 4]]

Out[275]: 
(<einsum_py.Labels at 0xb4f80cac>,
 [[0, 1, 2, -1, -1], [-1, 0, 1, 2, 3], [0, 1, 2, 3, 4]])

Using 'ajk,jkzZ->ajkzZ' changes labels, but results in the same op_axes.

Here is a first draft of a translation function. It should work for any list of lists (of hashable items):

def translate(ll):
    mset=set()
    for i in ll: 
        mset.update(i)
    dd={k:v for v,k in enumerate(mset)}
    x=[''.join([chr(dd[i]+97) for i in l]) for l in ll]
    #  ['cdb', 'dbea', 'cdbea']
    y=','.join(x[:-1])+'->'+x[-1]
    # 'cdb,dbea->cdbea'

In [377]: A=np.ones((3,1,2),int)
In [378]: B=np.ones((1,2,4,3),int)
In [380]: ll=[list(i) for i in ['ijk','jklm','ijklm']]
In [381]: y=translate(ll)
In [382]: y
Out[382]: 'cdb,dbea->cdbea'

In [383]: np.einsum(y,A,B).shape
Out[383]: (3, 1, 2, 4, 3)

The use of set to map index objects means that the final indexing characters are unordered. As long as you specify the RHS that shouldn't be an issue. Also I ignored ellipsis.

=================

The list version of einsum input is converted to the subscript string version in einsum_list_to_subscripts() (in numpy/core/src/multiarray/multiarraymodule.c). It replace ELLIPSIS with '...'. It raised the [0,52] error message if ( s < 0 || s > 2*26) where s is a number in one of those sublists. And converts s to string with

        if (s < 26) {
            subscripts[subindex++] = 'A' + s;
        }
        else {
            subscripts[subindex++] = 'a' + s;

But it looks like the 2nd case is not working; I get errors like for 26.

ValueError: invalid subscript '{' in einstein sum subscripts string, subscripts must be letters

That 'a'+s is wrong if s>26:

In [424]: ''.join([chr(ord('A')+i) for i in range(0,26)])
Out[424]: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

In [425]: ''.join([chr(ord('a')+i) for i in range(0,26)])
Out[425]: 'abcdefghijklmnopqrstuvwxyz'

In [435]: ''.join([chr(ord('a')+i) for i in range(26,52)])
Out[435]: '{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94'

That 'a'+s is wrong; is should be:

In [436]: ''.join([chr(ord('a')+i-26) for i in range(26,52)])
Out[436]: 'abcdefghijklmnopqrstuvwxyz'

I submitted https://github.com/numpy/numpy/issues/7741

The existence of this bug after all this time indicates that the sublist format is not common, and that using large numbers in that list is even less frequent.

answered Oct 12 '22 23:10

hpaulj

You could use the einsum(op0, sublist0, op1, sublist1, ..., [sublistout]) form instead of i,j,ik->ijk, which the API is not restricted to 52 axes*. How this verbose form corresponds to the ijk form are shown in the documentation.

OP's

np.einsum('ijk,jklm->ijklm', A, B)

would be written as

np.einsum(A, [0,1,2], B, [1,2,3,4], [0,1,2,3,4])

(* Note: The implementation is still restricted to 26 axes. See @hpaulj's answer and his bug report for explanation)

Equivalences from numpy's examples:

>>> np.einsum('ii', a)
>>> np.einsum(a, [0,0])

>>> np.einsum('ii->i', a)
>>> np.einsum(a, [0,0], [0])

>>> np.einsum('ij,j', a, b)
>>> np.einsum(a, [0,1], b, [1])

>>> np.einsum('ji', c)
>>> np.einsum(c, [1,0])

>>> np.einsum('..., ...', 3, c)
>>> np.einsum(3, [...], c, [...])

>>> np.einsum('i,i', b, b)
>>> np.einsum(b, [0], b, [0])

>>> np.einsum('i,j', np.arange(2)+1, b)
>>> np.einsum(np.arange(2)+1, [0], b, [1])

>>> np.einsum('i...->...', a)
>>> np.einsum(a, [0, ...], [...])

>>> np.einsum('ijk,jil->kl', a, b)
>>> np.einsum(a, [0,1,2], b, [1,0,3], [2,3])

answered Oct 13 '22 00:10

kennytm

Related questions
                            
                                reduce python dictionary values if key meets condition
                            
                                How to draw bar chart using Plotly Offline mode in python?
                            
                                How to remove empty "padding" in matplotlib barh plot?
                            
                                Pandas to_sql gives ValueError with timezone-aware column
                            
                                PEP440 compliant AND git describe info available from deployed package
                            
                                Convert module to dict
                            
                                OpenCV will not load a big image (~4GB)
                            
                                Matplotlib bar plot remove internal lines
                            
                                Beautiful Soup 'ResultSet' object has no attribute 'text'
                            
                                How to perform os.environ join in python?
                            
                                AWS Lambda Python libssl C Library
                            
                                Concatenate two numerical values to make a new column using pandas?
                            
                                ValueError: jpeg is required unless explicitly disabled using --disable-jpeg, aborting
                            
                                How to use Gevents with Falcon?
                            
                                Array of characters in python 3?
                            
                                How to use Plotly with Zeppelin
                            
                                How do you add environment variables to your django project
                            
                                How to check if a file is already opened (in the same process)
                            
                                Python: Pandas Dataframe AttributeError: 'numpy.ndarray' object has no attribute 'fillna'
                            
                                ReportLab Paragraph and text formatting

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can I use more than 26 letters in `numpy.einsum`?

Tags:

python

numpy

numpy-einsum

akxlr

People also ask

2 Answers

hpaulj

kennytm

Recent Activity

Donate For Us