I'm trying to use the join
function on a numpy array composed of only strings (representing binary floats) to get the joined string in order to use the numpy.fromstring
function, but the join
function doesn't seem to work properly.
Any idea why? Which alternative function can I use to do that?
Here is a standalone example to show my problem:
import numpy as np
nb_el = 10
table = np.arange(nb_el, dtype='float64')
print table
binary = table.tostring()
binary_list = map(''.join, zip(*[iter(binary)] * table.dtype.itemsize))
print 'len binary list :', len(binary_list)
# len binary list : 10
join_binary_list = ''.join(binary_list)
print np.fromstring(join_binary_list, dtype='float64')
# [ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
binary_split_array = np.array(binary_list)
print 'nb el :', binary_split_array.shape
# nb el : (10,)
print 'nb_el * size :', binary_split_array.shape[0] * binary_split_array.dtype.itemsize
# nb_el * size : 80
join_binary_split_array = ''.join(binary_split_array)
print 'len binary array :', len(join_binary_split_array)
# len binary array : 72
table_fromstring = np.fromstring(join_binary_split_array, dtype='float64')
print table_fromstring
# [ 1. 2. 3. 4. 5. 6. 7. 8. 9.]
As you can see, using the join function on the list (binary_list
) works properly, but on the equivalent numpy array (binary_split_array
) it doesn't: we can see the string returned is only 72 characters long instead of 80.
The elements of a NumPy array, or simply an array, are usually numbers, but can also be boolians, strings, or other objects.
Joining NumPy Arrays In SQL we join tables based on a key, whereas in NumPy we join arrays by axes. We pass a sequence of arrays that we want to join to the concatenate() function, along with the axis. If axis is not explicitly passed, it is taken as 0.
Numpy with Python Concatenation refers to joining. This function is used to join two or more arrays of the same shape along a specified axis.
Splitting NumPy Arrays Splitting is reverse operation of Joining. Joining merges multiple arrays into one and Splitting breaks one array into multiple. We use array_split() for splitting arrays, we pass it the array we want to split and the number of splits.
The first element of your join_binary_split_array
is an empty string:
print(repr(binary_split_array[0]))
''
The first element in your list is:
'\x00\x00\x00\x00\x00\x00\x00\x00'
An empty string has a length of 0:
print([len("".join(a)) for a in binary_split_array])
print([len("".join(a)) for a in binary_list])
[0, 8, 8, 8, 8, 8, 8, 8, 8, 8]
[8, 8, 8, 8, 8, 8, 8, 8, 8, 8]
The length of the str of bytes 8:
print(len('\x00\x00\x00\x00\x00\x00\x00\x00'))
8
Calling tobytes will give the same output length as the list:
print(len(binary_split_array.tobytes()))
80
table_fromstring = np.fromstring(binary_split_array.tobytes(), dtype='float64')
print table_fromstring
[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
The numpy array handles null bytes differently to python, null bytes are truncated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With