I have a program that outputs numpy arrays that looks like for example:
[[a1, a2],
[b1],
[c1, c2, c3]]
Is there an elegant and python-way to turn this into this ?
[[a1, b1, c1],
[a2, c2],
[c3]]
This purpose of this is to get the sum/average over the columns that does not complain if some values are missing, so I am happy with something that can do this directly. Here is a example for you to copy past:
import numpy
test = numpy.array([
numpy.array([3, 5]),
numpy.array([3.4]),
numpy.array([2.8, 5.3, 7.1])
])
Since you don't have a matrix you can't benefit from Numpy's vectorized functionalities. Instead you can use itertools.zip_longest and filter as following to get what you want:
In [13]: import numpy as np
In [14]: import numpy
...: test = np.array(
...: [np.array([3 , 5]),
...: np.array([3.4]),
...: np.array([2.8,5.3,7.1])])
...:
In [15]: from itertools import zip_longest
In [16]: [np.fromiter(filter(bool, i), dtype=np.float) for i in zip_longest(*test)]
Out[16]: [array([3. , 3.4, 2.8]), array([5. , 5.3]), array([7.1])]
Note that using bool as the filtering function will eliminate items like 0 or empty string which their bool value is False.
If you're not sure that you might have such items in your array you can just use another list comprehension or a lambda function with filter.
[np.array([for i in sub if i is not None]) for sub in zip_longest(*test)]
You might also wanna take a look at the zip_longest's roughly equivalent implementation so that (if possible) generate the desired result at the first place before returning that list.
You lose all the benefits of numpy arrays when you start treating them as ragged lists. An alternative is to set empty/missing elements to NaN, and use the functions prefixed with "nan" in the numpy suite to compute your statistics. For example, mean maps to nanmean, sum maps to nansum, etc (complete list here). This has the additional advantage that the order of the gaps does not matter.
If at all possible, have your program create a single array that looks like this:
test = np.array([
[3.0, 5.0, np.nan],
[3.4, np.nan, np.nan],
[2.8, 5.3, 7.1]])
If not, here is a primitive attempt at converting the input:
def to_full(a):
output = np.full((len(a), max(map(len, a))), np.nan)
for i, row in enumerate(a):
output[i, :len(row)] = row
return output
Now computing the mean is trivial:
mean = np.nanmean(test, axis=0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With