I need to decode, with Python 3, a string that was encoded the following way:
>>> s = numpy.asarray(numpy.string_("hello\nworld"))
>>> s
array(b'hello\nworld',
dtype='|S11')
I tried:
>>> str(s)
"b'hello\\nworld'"
>>> s.decode()
AttributeError Traceback (most recent call last)
<ipython-input-31-7f8dd6e0676b> in <module>()
----> 1 s.decode()
AttributeError: 'numpy.ndarray' object has no attribute 'decode'
>>> s[0].decode()
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-34-fae1dad6938f> in <module>()
----> 1 s[0].decode()
IndexError: 0-d arrays can't be indexed
Another option is the np.char
collection of string operations.
In [255]: np.char.decode(s)
Out[255]:
array('hello\nworld',
dtype='<U11')
It accepts the encoding
keyword if needed. But .astype
is probably better if you don't need this.
This s
is 0d (shape ()), so needs to be indexed with s[()]
.
In [268]: s[()]
Out[268]: b'hello\nworld'
In [269]: s[()].decode()
Out[269]: 'hello\nworld'
s.item()
also works.
In Python 3, there are two types that represent sequences of characters: bytes
and str
(contain Unicode characters). When you use string_
as your type, numpy will return bytes
. If you want the regular str
you should use unicode_
type in numpy:
>>> s = numpy.asarray(numpy.unicode_("hello\nworld"))
>>> s
array('hello\nworld',
dtype='<U11')
>>> str(s)
'hello\nworld'
But note that if you don't specify a type for your string (string_ or unicode_) it will return the default str type (which in python 3.x is the str (contain the unicode characters)).
>>> s = numpy.asarray("hello\nworld")
>>> str(s)
'hello\nworld'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With