Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weird behaviour initializing a numpy array of string data

Tags:

python

numpy

I am having some seemingly trivial trouble with numpy when the array contains string data. I have the following code:

my_array = numpy.empty([1, 2], dtype = str) my_array[0, 0] = "Cat" my_array[0, 1] = "Apple" 

Now, when I print it with print my_array[0, :], the response I get is ['C', 'A'], which is clearly not the expected output of Cat and Apple. Why is that, and how can I get the right output?

Thanks!

like image 245
Jim Avatar asked Dec 05 '12 06:12

Jim


People also ask

Does NumPy work with strings?

The numpy. char module provides a set of vectorized string operations for arrays of type numpy.

What is NumPy array manipulation?

NumPy is a powerful foundational library in Python and can be used to perform a wide variety of mathematical operations on arrays. It guarantees efficient calculations and offers high-level functions that operate on arrays and matrices.

How do I create an empty string array in NumPy?

Numpy requires string arrays to have a fixed maximum length. When you create an empty array with dtype=str , it sets this maximum length to 1 by default. You can see if you do my_array. dtype ; it will show "|S1", meaning "one-character string".


2 Answers

Numpy requires string arrays to have a fixed maximum length. When you create an empty array with dtype=str, it sets this maximum length to 1 by default. You can see if you do my_array.dtype; it will show "|S1", meaning "one-character string". Subsequent assignments into the array are truncated to fit this structure.

You can pass an explicit datatype with your maximum length by doing, e.g.:

my_array = numpy.empty([1, 2], dtype="S10") 

The "S10" will create an array of length-10 strings. You have to decide how big will be big enough to hold all the data you want to hold.

like image 64
BrenBarn Avatar answered Sep 22 '22 03:09

BrenBarn


I got a "codec error" when I tried to use a non-ascii character with dtype="S10"

You also get an array with binary strings, which confused me.

I think it is better to use:

my_array = numpy.empty([1, 2], dtype="<U10")

Here 'U10' translates to "Unicode string of length 10; little endian format"

like image 35
Johny White Avatar answered Sep 23 '22 03:09

Johny White