Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

join function of a numpy array composed of string

I'm trying to use the join function on a numpy array composed of only strings (representing binary floats) to get the joined string in order to use the numpy.fromstring function, but the join function doesn't seem to work properly.

Any idea why? Which alternative function can I use to do that?

Here is a standalone example to show my problem:

import numpy as np

nb_el = 10

table = np.arange(nb_el, dtype='float64')
print table

binary = table.tostring()

binary_list = map(''.join, zip(*[iter(binary)] * table.dtype.itemsize))
print 'len binary list :', len(binary_list)
# len binary list : 10

join_binary_list = ''.join(binary_list)
print np.fromstring(join_binary_list, dtype='float64')
# [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]

binary_split_array = np.array(binary_list)
print 'nb el :', binary_split_array.shape
# nb el : (10,)
print 'nb_el * size :', binary_split_array.shape[0] * binary_split_array.dtype.itemsize
# nb_el * size : 80

join_binary_split_array = ''.join(binary_split_array)
print 'len binary array :', len(join_binary_split_array)
# len binary array : 72

table_fromstring = np.fromstring(join_binary_split_array, dtype='float64')
print table_fromstring
# [ 1.  2.  3.  4.  5.  6.  7.  8.  9.]

As you can see, using the join function on the list (binary_list) works properly, but on the equivalent numpy array (binary_split_array) it doesn't: we can see the string returned is only 72 characters long instead of 80.

like image 413
Thomas Leonard Avatar asked May 20 '15 16:05

Thomas Leonard


People also ask

Can a NumPy array contain strings?

The elements of a NumPy array, or simply an array, are usually numbers, but can also be boolians, strings, or other objects.

How do I join a NumPy array?

Joining NumPy Arrays In SQL we join tables based on a key, whereas in NumPy we join arrays by axes. We pass a sequence of arrays that we want to join to the concatenate() function, along with the axis. If axis is not explicitly passed, it is taken as 0.

What does concatenate mean in NumPy?

Numpy with Python Concatenation refers to joining. This function is used to join two or more arrays of the same shape along a specified axis.

What is split function in NumPy?

Splitting NumPy Arrays Splitting is reverse operation of Joining. Joining merges multiple arrays into one and Splitting breaks one array into multiple. We use array_split() for splitting arrays, we pass it the array we want to split and the number of splits.


1 Answers

The first element of your join_binary_split_array is an empty string:

print(repr(binary_split_array[0]))    
''

The first element in your list is:

'\x00\x00\x00\x00\x00\x00\x00\x00'

An empty string has a length of 0:

print([len("".join(a)) for a in binary_split_array])
print([len("".join(a)) for a in binary_list])
[0, 8, 8, 8, 8, 8, 8, 8, 8, 8]
[8, 8, 8, 8, 8, 8, 8, 8, 8, 8]

The length of the str of bytes 8:

print(len('\x00\x00\x00\x00\x00\x00\x00\x00'))
8

Calling tobytes will give the same output length as the list:

print(len(binary_split_array.tobytes()))
80

table_fromstring = np.fromstring(binary_split_array.tobytes(), dtype='float64')

print table_fromstring
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]

The numpy array handles null bytes differently to python, null bytes are truncated.

like image 195
Padraic Cunningham Avatar answered Oct 26 '22 00:10

Padraic Cunningham