I am having some seemingly trivial trouble with numpy when the array contains string data. I have the following code: <pre class="prettyprint"><code>my_array = numpy.empty([1, 2], dtype = str) my_array[0, 0] = "Cat" my_array[0, 1] = "Apple" </code></pre> Now, when I print it with <code>print my_array[0, :]</code>, the response I get is <code>['C', 'A']</code>, which is clearly not the expected output of Cat and Apple. Why is that, and how can I get the right output? Thanks!

I got a "codec error" when I tried to use a non-ascii character with <code>dtype="S10"</code> You also get an array with binary strings, which confused me. I think it is better to use: <code>my_array = numpy.empty([1, 2], dtype="<U10")</code> Here 'U10' translates to "Unicode string of length 10; little endian format"

Weird behaviour initializing a numpy array of string data

Tags:

python

numpy

I am having some seemingly trivial trouble with numpy when the array contains string data. I have the following code:

my_array = numpy.empty([1, 2], dtype = str) my_array[0, 0] = "Cat" my_array[0, 1] = "Apple"

Now, when I print it with print my_array[0, :], the response I get is ['C', 'A'], which is clearly not the expected output of Cat and Apple. Why is that, and how can I get the right output?

Thanks!

245

asked Dec 05 '12 06:12

Jim

2 Answers

Numpy requires string arrays to have a fixed maximum length. When you create an empty array with dtype=str, it sets this maximum length to 1 by default. You can see if you do my_array.dtype; it will show "|S1", meaning "one-character string". Subsequent assignments into the array are truncated to fit this structure.

You can pass an explicit datatype with your maximum length by doing, e.g.:

my_array = numpy.empty([1, 2], dtype="S10")

The "S10" will create an array of length-10 strings. You have to decide how big will be big enough to hold all the data you want to hold.

answered Sep 22 '22 03:09

BrenBarn

I got a "codec error" when I tried to use a non-ascii character with dtype="S10"

You also get an array with binary strings, which confused me.

I think it is better to use:

my_array = numpy.empty([1, 2], dtype="<U10")

Here 'U10' translates to "Unicode string of length 10; little endian format"

answered Sep 23 '22 03:09

Johny White

Related questions
                            
                                Am I safe mixing types in a Python list?
                            
                                `os.symlink` vs `ln -s`
                            
                                Scraping a JSON response with Scrapy
                            
                                Creating a pandas DataFrame from columns of other DataFrames with similar indexes
                            
                                How do you check if the client for a MongoDB instance is valid?
                            
                                How do you import a file in python with spaces in the name?
                            
                                How to use Flask-SQLAlchemy in a Celery task
                            
                                Python's argparse to show program's version with prog and version string formatting
                            
                                splitting a string based on tab in the file
                            
                                how to ignore index comparison for pandas assert frame equal
                            
                                How to avoid "CUDA out of memory" in PyTorch
                            
                                Replace all quotes in a string with escaped quotes?
                            
                                Python return list from function
                            
                                How to delete rows from a table using an SQLAlchemy query without ORM?
                            
                                How to display text in pygame? [duplicate]
                            
                                Python string formatting: reference one argument multiple times
                            
                                How can I make ipdb show more lines of context while debugging?
                            
                                Python's "open()" throws different errors for "file not found" - how to handle both exceptions?
                            
                                Sort multidimensional array based on 2nd element of the subarray
                            
                                How to fix "AttributeError: module 'tensorflow' has no attribute 'get_default_graph'"?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With