Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a numpy array of arbitrary length strings?

I'm a complete rookie to Python, but it seems like a given string is able to be (effectively) arbitrary length. i.e. you can take a string str and keeping adding to it: str += "some stuff...". Is there a way to make an array of such strings?

When I try this, each element only stores a single character

strArr = numpy.empty(10, dtype='string')
for i in range(0,10)
    strArr[i] = "test"

On the other hand, I know I can initialize an array of certain length strings, i.e.

strArr = numpy.empty(10, dtype='s256')

which can store 10 strings of up to 256 characters.

like image 491
DilithiumMatrix Avatar asked Feb 01 '13 03:02

DilithiumMatrix


People also ask

Can you have NumPy array of strings?

The elements of a NumPy array, or simply an array, are usually numbers, but can also be boolians, strings, or other objects.

Does NumPy array have length?

You can get the number of dimensions, shape (length of each dimension), and size (number of all elements) of the NumPy array with ndim , shape , and size attributes of numpy. ndarray . The built-in function len() returns the size of the first dimension.

How do I change the length of a NumPy array?

Size of a numpy array can be changed by using resize() function of the NumPy library. refcheck- It is a boolean that checks the reference count.


2 Answers

You can do so by creating an array of dtype=object. If you try to assign a long string to a normal numpy array, it truncates the string:

>>> a = numpy.array(['apples', 'foobar', 'cowboy'])
>>> a[2] = 'bananas'
>>> a
array(['apples', 'foobar', 'banana'], 
      dtype='|S6')

But when you use dtype=object, you get an array of python object references. So you can have all the behaviors of python strings:

>>> a = numpy.array(['apples', 'foobar', 'cowboy'], dtype=object)
>>> a
array([apples, foobar, cowboy], dtype=object)
>>> a[2] = 'bananas'
>>> a
array([apples, foobar, bananas], dtype=object)

Indeed, because it's an array of objects, you can assign any kind of python object to the array:

>>> a[2] = {1:2, 3:4}
>>> a
array([apples, foobar, {1: 2, 3: 4}], dtype=object)

However, this undoes a lot of the benefits of using numpy, which is so fast because it works on large contiguous blocks of raw memory. Working with python objects adds a lot of overhead. A simple example:

>>> a = numpy.array(['abba' for _ in range(10000)])
>>> b = numpy.array(['abba' for _ in range(10000)], dtype=object)
>>> %timeit a.copy()
100000 loops, best of 3: 2.51 us per loop
>>> %timeit b.copy()
10000 loops, best of 3: 48.4 us per loop
like image 173
senderle Avatar answered Sep 29 '22 18:09

senderle


You could use the object data type:

>>> import numpy
>>> s = numpy.array(['a', 'b', 'dude'], dtype='object')
>>> s[0] += 'bcdef'
>>> s
array([abcdef, b, dude], dtype=object)
like image 23
jterrace Avatar answered Sep 25 '22 18:09

jterrace