Numpy arrays for numerical data clearly work great, but is it slower to use them for non-numerical data?
For instance, say I have some nested lists of text data:
mammals = ['dog', 'cat', 'rat']
birds = ['stork', 'robin', 'penguin']
animals1 = [mammals, birds]
When accessing and manipulating this data is this list of nested lists going to be faster than the numpy array equivalent?
import numpy as np
animals2 = np.array(animals1)
Since numpy arrays are implemented as "strided" arrays where each element has a fixed length, a "sparse" list of strings with a few long strings will use up a disproportionate amount of memory if converted to a numpy array. But what about speed?
As @JoshAdel has pointed out, you should become familiar with the timeit module. I believe you are asking about this comparison:
>>> import timeit
>>> timeit.timeit('[[x.upper() for x in y] * 10000 for y in animals1]', setup="mammals = ['dog', 'cat', 'rat']\nbirds = ['stork', 'robin', 'penguin']\nanimals1 = [mammals, birds]", number=10000)
1.7549941045438686
>>> timeit.timeit("numpy.char.upper(animals2)", setup="import numpy\nmammals = ['dog', 'cat', 'rat']\nbirds = ['stork', 'robin', 'penguin']\nanimals1 = [mammals, birds] * 10000\nanimals2=numpy.array(animals1)", number=10000)
221.09816223832195
I updated the test based on your comment. The question is a good one, but you might just need to try some other operations with numpy.char to figure out how it performs. The source file points to a .pyd (dll-type) file with a _vec_string function.
Clearly there is a difference between those two snippets of cod above, with numpy taking over 100 times longer to execute a numpy.char.upper() operation than python takes to execute the .upper() string method.
timeit is very simple to use for small snippets of code like this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With