Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get transpose from uneven numpy array and/or get average from uneven numpy array

I have a program that outputs numpy arrays that looks like for example:

[[a1, a2],
 [b1],
 [c1, c2, c3]]

Is there an elegant and python-way to turn this into this ?

[[a1, b1, c1],
 [a2, c2],
 [c3]]

This purpose of this is to get the sum/average over the columns that does not complain if some values are missing, so I am happy with something that can do this directly. Here is a example for you to copy past:

import numpy
test = numpy.array([
        numpy.array([3, 5]),
        numpy.array([3.4]),
        numpy.array([2.8, 5.3, 7.1])
])
like image 541
HcN Avatar asked Jun 15 '26 12:06

HcN


2 Answers

Since you don't have a matrix you can't benefit from Numpy's vectorized functionalities. Instead you can use itertools.zip_longest and filter as following to get what you want:

In [13]: import numpy as np

In [14]: import numpy
    ...: test = np.array(
    ...: [np.array([3 , 5]),
    ...:         np.array([3.4]),
    ...:         np.array([2.8,5.3,7.1])])
    ...:         

In [15]: from itertools import zip_longest

In [16]: [np.fromiter(filter(bool, i), dtype=np.float) for i in zip_longest(*test)]
Out[16]: [array([3. , 3.4, 2.8]), array([5. , 5.3]), array([7.1])]

Note that using bool as the filtering function will eliminate items like 0 or empty string which their bool value is False.

If you're not sure that you might have such items in your array you can just use another list comprehension or a lambda function with filter.

[np.array([for i in sub if i is not None]) for sub in zip_longest(*test)]

You might also wanna take a look at the zip_longest's roughly equivalent implementation so that (if possible) generate the desired result at the first place before returning that list.

like image 69
Mazdak Avatar answered Jun 17 '26 02:06

Mazdak


You lose all the benefits of numpy arrays when you start treating them as ragged lists. An alternative is to set empty/missing elements to NaN, and use the functions prefixed with "nan" in the numpy suite to compute your statistics. For example, mean maps to nanmean, sum maps to nansum, etc (complete list here). This has the additional advantage that the order of the gaps does not matter.

If at all possible, have your program create a single array that looks like this:

test = np.array([
    [3.0, 5.0, np.nan],
    [3.4, np.nan, np.nan],
    [2.8, 5.3, 7.1]])

If not, here is a primitive attempt at converting the input:

def to_full(a):
    output = np.full((len(a), max(map(len, a))), np.nan)
    for i, row in enumerate(a):
        output[i, :len(row)] = row
    return output

Now computing the mean is trivial:

mean = np.nanmean(test, axis=0)
like image 32
Mad Physicist Avatar answered Jun 17 '26 01:06

Mad Physicist



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!