NumPy's string
dtype seems to correspond to Python's str
and thus to change between Python 2.x and 3.x:
In Python 2.7:
In [1]: import numpy as np
In [2]: np.dtype((np.str_, 1)).itemsize
Out[2]: 1
In [3]: np.dtype((np.unicode_, 1)).itemsize
Out[3]: 4
In Python 3.3:
In [2]: np.dtype((np.str_, 1)).itemsize
Out[2]: 4
The version of NumPy is 1.7.0 in both cases.
I'm writing some code that I want to work on both Python versions, and I want an array of ASCII strings (4x memory overhead is not acceptable). So the questions are:
ascii_uppercase
, and save a bit or two per char?Something that I see as the potential answer are character arrays for the first question (i.e. have an array of character arrays instead of an array of strings). Seems like I can specify the item size when constructing one:
chararray(shape, itemsize=1, unicode=False, buffer=None, offset=0,
strides=None, order=None)
Update: nah, the itemsize
is actually the number of characters. But there's still unicode=False
.
Is that the way to go?
Will it answer the last question, too?
And how do I actually use it as dtype
?
You can use the 'S' typestr:
>>> np.array(['Hello', 'World'], dtype='S')
array([b'Hello', b'World'],
dtype='|S5')
Also in 2.6/2.7 str
is aliased to bytes
(or np.bytes_
):
>>> np.dtype((bytes, 1)) # 2.7
dtype('|S1')
>>> np.dtype((bytes, 1)) # 3.2
dtype('|S1')
And b''
literals are supported:
>>> np.array([b'Hello', b'World']) # 2.7
array(['Hello', 'World'],
dtype='|S5')
>>> np.array([b'Hello', b'World']) # 3.2
array([b'Hello', b'World'],
dtype='|S5')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With