I have an array of strings
>>> lines
array(['RL5\\Stark_223', 'RL5\\Stark_223', 'RL5\\Stark_223', ...,
'RL5\\Stark_238', 'RL5\\Stark_238', 'RL5\\Stark_238'],
dtype='|S27')
Why can I index into a string for the first array element
>>> lines[0][0:3]
'RL5'
But not into the same place for all array elements
>>> lines[:][0:3]
array(['RL5\\Stark_223', 'RL5\\Stark_223', 'RL5\\Stark_223'],
dtype='|S27')
Can anyone suggest a method to get the following result:
array(['RL5', 'RL5', 'RL5', ...'RL5', 'RL5')
To extract the first n
characters of every string you can abuse .astype
:
>>> s = np.array(['RL5\\Stark_223', 'RL5\\Stark_223', 'RL5\\Stark_223'])
>>> s
array(['RL5\\Stark_223', 'RL5\\Stark_223', 'RL5\\Stark_223'],
dtype='|S13')
>>> s.astype('|S3')
array(['RL5', 'RL5', 'RL5'],
dtype='|S3')
Dont forget chararrays!
lines.view(np.chararray).ljust(3)
chararray(['RL5', 'RL5', 'RL5', 'RL5', 'RL5', 'RL5'],
dtype='|S3')
Although its strangely slower:
#Extend lines to 600000 elements
%timeit lines.view(np.chararray).ljust(3)
1 loops, best of 3: 542 ms per loop
%timeit np.vectorize(lambda x: x[:3])(lines)
1 loops, best of 3: 239 ms per loop
%timeit map(lambda s: s[0:3], lines)
1 loops, best of 3: 243 ms per loop
%timeit arr.astype('|S3')
100 loops, best of 3: 4.72 ms per loop
Could be because its duplicating the data, the benefit of this is the dtype of the output array is minimized: S3
vs S64
.
try this
map(lambda s:s[0:3],lines)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With