I am having some seemingly trivial trouble with numpy when the array contains string data. I have the following code:
my_array = numpy.empty([1, 2], dtype = str) my_array[0, 0] = "Cat" my_array[0, 1] = "Apple"
Now, when I print it with print my_array[0, :]
, the response I get is ['C', 'A']
, which is clearly not the expected output of Cat and Apple. Why is that, and how can I get the right output?
Thanks!
The numpy. char module provides a set of vectorized string operations for arrays of type numpy.
NumPy is a powerful foundational library in Python and can be used to perform a wide variety of mathematical operations on arrays. It guarantees efficient calculations and offers high-level functions that operate on arrays and matrices.
Numpy requires string arrays to have a fixed maximum length. When you create an empty array with dtype=str , it sets this maximum length to 1 by default. You can see if you do my_array. dtype ; it will show "|S1", meaning "one-character string".
Numpy requires string arrays to have a fixed maximum length. When you create an empty array with dtype=str
, it sets this maximum length to 1 by default. You can see if you do my_array.dtype
; it will show "|S1", meaning "one-character string". Subsequent assignments into the array are truncated to fit this structure.
You can pass an explicit datatype with your maximum length by doing, e.g.:
my_array = numpy.empty([1, 2], dtype="S10")
The "S10" will create an array of length-10 strings. You have to decide how big will be big enough to hold all the data you want to hold.
I got a "codec error" when I tried to use a non-ascii character with dtype="S10"
You also get an array with binary strings, which confused me.
I think it is better to use:
my_array = numpy.empty([1, 2], dtype="<U10")
Here 'U10' translates to "Unicode string of length 10; little endian format"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With