numpy array of strings indexing behavior

Question

I have an array of strings

>>> lines
array(['RL5\Stark_223', 'RL5\Stark_223', 'RL5\Stark_223', ...,
       'RL5\Stark_238', 'RL5\Stark_238', 'RL5\Stark_238'], 
      dtype='|S27')

Why can I index into a string for the first array element

>>> lines[0][0:3]
'RL5'

But not into the same place for all array elements

>>> lines[:][0:3]
array(['RL5\Stark_223', 'RL5\Stark_223', 'RL5\Stark_223'], 
      dtype='|S27')

Can anyone suggest a method to get the following result:

array(['RL5', 'RL5', 'RL5', ...'RL5', 'RL5')

Jaime · Accepted Answer

To extract the first n characters of every string you can abuse .astype:

>>> s = np.array(['RL5\Stark_223', 'RL5\Stark_223', 'RL5\Stark_223'])
>>> s
array(['RL5\Stark_223', 'RL5\Stark_223', 'RL5\Stark_223'], 
      dtype='|S13')
>>> s.astype('|S3')
array(['RL5', 'RL5', 'RL5'], 
      dtype='|S3')

Daniel · Answer

Dont forget chararrays!

lines.view(np.chararray).ljust(3)
chararray(['RL5', 'RL5', 'RL5', 'RL5', 'RL5', 'RL5'], 
      dtype='|S3')

Although its strangely slower:

#Extend lines to 600000 elements

%timeit lines.view(np.chararray).ljust(3)
1 loops, best of 3: 542 ms per loop

%timeit np.vectorize(lambda x: x[:3])(lines)
1 loops, best of 3: 239 ms per loop

%timeit map(lambda s: s[0:3], lines)
1 loops, best of 3: 243 ms per loop

%timeit arr.astype('|S3')
100 loops, best of 3: 4.72 ms per loop

Could be because its duplicating the data, the benefit of this is the dtype of the output array is minimized: S3 vs S64.

DonaldAnderson · Answer

try this

map(lambda s:s[0:3],lines)

numpy array of strings indexing behavior

Tags:

python

arrays

string

indexing

numpy

geophys

3 Answers

Jaime

Daniel

DonaldAnderson

Recent Activity

Donate For Us

numpy array of strings indexing behavior

Tags:

python

arrays

string

indexing

numpy

geophys

3 Answers

Jaime

Daniel

DonaldAnderson

Related questions

Recent Activity

Donate For Us