I processing really big numbers, integers, with 10000 digits, so I split each number to array.
Small data sample:
#all combinations with length 3 of values in list L
N = 3
L = [[1,9,0]]*N
a = np.array(np.meshgrid(*L)).T.reshape(-1,N)
#it is number so removed first 0 and also last value is always 0
a = a[(a[:, 0] != 0) & (a[:, -1] == 0)]
print (a)
[[1 1 0]
[1 9 0]
[1 0 0]
[9 1 0]
[9 9 0]
[9 0 0]]
Then I need multiple number by 1.1 scalar. For better understanding:
#joined arrays to numbers
b = np.array([int(''.join(x)) for x in a.astype(str)])[:, None]
print (b)
[[110]
[190]
[100]
[910]
[990]
[900]]
#multiple by constant
c = b * 1.1
print (c)
[[ 121.]
[ 209.]
[ 110.]
[1001.]
[1089.]
[ 990.]]
But because 10000 digits, this solution is not possible, because rounding. So I need solution for multiple in arrays:
What I try: Added last 0 'column' to first and then sum:
a1 = np.hstack((a[:, [-1]] , a[:, :-1] ))
print (a1)
[[0 1 1]
[0 1 9]
[0 1 0]
[0 9 1]
[0 9 9]
[0 9 0]]
print (a1 + a)
[[ 1 2 1]
[ 1 10 9]
[ 1 1 0]
[ 9 10 1]
[ 9 18 9]
[ 9 9 0]]
But problem is if value is more like 9 is necessary shift next digits (like old school paper summing), expected output is:
c1 = np.array([list(str(x).split('.')[0].zfill(4)) for x in np.ravel(c)]).astype(int)
print (c1)
[[0 1 2 1]
[0 2 0 9]
[0 1 1 0]
[1 0 0 1]
[1 0 8 9]
[0 9 9 0]]
Is possible some fast vectorized solution for generate c1 array from a array?
EDIT: I try another data for testing and solution by @yatu raise error:
ValueError: cannot convert float NaN to integer
from itertools import product,zip_longest
def grouper(iterable, n, fillvalue=None):
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
#real data
#M = 100000
#N = 500
#loop by chunks by length 5
M = 20
N = 5
v = [0]*M
for i in grouper(product([9, 0], repeat=M), N, v):
a = np.array(i)
# print (a)
#it is number so removed first 0 and also last value is always 0
a = a[(a[:, 0] != 0) & (a[:, -1] == 0)]
print (a)
#
s = np.arange(a.shape[1]-1, -1, -1)
# concat digits along cols, and multiply
b = (a * 10**s).sum(1)*1.1
# highest amount of digits in b
n_cols = int(np.log10(b.max()))
# broadcast division to reverse
c = b[:, None] // 10**np.arange(n_cols, -1, -1)
# keep only last digit
c1 = (c%10).astype(int)
print (c1)
Here's a vectorized one working from a. The idea is to multiply each column by 10**seq, seq being an arange up to the number of columns, and in descending order. This will act as a concatenation of the digits along the columns once we take the sum along the second axis.
Finally we can reverse the process by applying the same logic but dividing instead and broadcasting to the resulting shape after multiplying by 1.1, and taking the modulo 10 of the result to keep only the last digit:
s = np.arange(a.shape[1]-1, -1, -1, dtype=np.float64)
# concat digits along cols, and multiply
b = (a * 10**s).sum(1)*1.1
# highest amount of digits in b
n_cols = int(np.log10(b.max()))
# broadcast division to reverse
c = b[:, None] // 10**np.arange(n_cols, -1, -1, dtype=np.float64)
# keep only last digit
c1 = (c%10).astype(int)
print(c1)
array([[0, 1, 2, 1],
[0, 2, 0, 9],
[0, 1, 1, 0],
[1, 0, 0, 1],
[1, 0, 8, 9],
[0, 9, 9, 0]])
Update -
The above approach works for integers no higher than the supported for int64, which is:
np.iinfo(np.int64).max
# 9223372036854775807
However, what can be done in such cases is to save the arrays values as python int rather than a numpy dtype. So we could define both np.arange to be of dtype object, and the above should work for the shared example:
s = np.arange(a.shape[1]-1, -1, -1, dtype=object)
# concat digits along cols, and multiply
b = (a * 10**s).sum(1)*1.1
# highest amount of digits in b
n_cols = int(np.log10(b.max()))
# broadcast division to reverse
c = b[:, None] // 10**np.arange(n_cols, -1, -1, dtype=object)
# keep only last digit
c1 = (c%10).astype(int)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With