Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to decode a numpy array of encoded literals/strings in Python3? AttributeError: 'numpy.ndarray' object has no attribute 'decode'

In Python 3, I have the follow NumPy array of strings.

Each string in the NumPy array is in the form b'MD18EE instead of MD18EE.

For example:

import numpy as np
print(array1)
(b'first_element', b'element',...)

Normally, one would use .decode('UTF-8') to decode these elements.

However, if I try:

array1 = array1.decode('UTF-8')

I get the following error:

AttributeError: 'numpy.ndarray' object has no attribute 'decode'

How do I decode these elements from a NumPy array? (That is, I don't want b'')

EDIT:

Let's say I was dealing with a Pandas DataFrame with only certain columns that were encoded in this manner. For example:

import pandas as pd
df = pd.DataFrame(...)

df
        COL1          ....
0   b'entry1'         ...
1   b'entry2'
2   b'entry3'
3   b'entry4'
4   b'entry5'
5   b'entry6'
like image 557
ShanZhengYang Avatar asked Nov 02 '16 20:11

ShanZhengYang


2 Answers

If you want the result to be a (Python) list of strings, you can use a list comprehension:

>>> l = [el.decode('UTF-8') for el in array1]
>>> print(l)
['element', 'element 2']
>>> print(type(l))
<class 'list'>

Alternatively, if you want to keep it as a Numpy array, you can use np.vectorize to make a vectorized decoder function:

>>> decoder = np.vectorize(lambda x: x.decode('UTF-8'))
>>> array2 = decoder(array1)
>>> print(array2)
['element' 'element 2']
>>> print(type(array2))
<class 'numpy.ndarray'>
like image 99
Wander Nauta Avatar answered Oct 29 '22 09:10

Wander Nauta


You have an array of bytestrings; dtype is S:

In [338]: arr=np.array((b'first_element', b'element'))
In [339]: arr
Out[339]: 
array([b'first_element', b'element'], 
      dtype='|S13')

astype easily converts them to unicode, the default string type for Py3.

In [340]: arr.astype('U13')
Out[340]: 
array(['first_element', 'element'], 
      dtype='<U13')

There is also a library of string functions - applying the corresponding str method to the elements of a string array

In [341]: np.char.decode(arr)
Out[341]: 
array(['first_element', 'element'], 
      dtype='<U13')

The astype is faster, but the decode lets you specify an encoding.

See also How to decode a numpy array of dtype=numpy.string_?

like image 20
hpaulj Avatar answered Oct 29 '22 08:10

hpaulj