Is it possible to initialise a numpy recarray that will hold strings, without knowing the length of the strings beforehand?
As a (contrived) example:
mydf = np.empty( (numrows,), dtype=[ ('file_name','STRING'), ('file_size_MB',float) ] )
The problem is that I'm constructing my recarray in advance of populating it with information, and I don't necessarily know the maximum length of file_name
in advance.
All my attempts result in the string field being truncated:
>>> mydf = np.empty( (2,), dtype=[('file_name',str),('file_size_mb',float)] )
>>> mydf['file_name'][0]='foobarasdf.tif'
>>> mydf['file_name'][1]='arghtidlsarbda.jpg'
>>> mydf
array([('', 6.9164002347457e-310), ('', 9.9413127e-317)],
dtype=[('file_name', 'S'), ('file_size_mb', '<f8')])
>>> mydf['file_name']
array(['f', 'a'],
dtype='|S1')
(As an aside, why does mydf['file_name']
show 'f' and 'a' whilst mydf
shows '' and ''?)
Similarly, if I initialise with type (say) |S10
for file_name
then things get truncated at length 10.
The only similar question I could find is this one, but this calculates the appropriate string length a priori and hence is not quite the same as mine (as I know nothing in advance).
Is there any alternative other than initalising the file_name
with (eg) |S9999999999999
(ie some ridiculous upper limit)?
The elements of a NumPy array, or simply an array, are usually numbers, but can also be boolians, strings, or other objects.
The numpy. char module provides a set of vectorized string operations for arrays of type numpy.
numpy arrays support only one type of data in the array. Changing the float to str is not a good idea as it will only result in values very close to the original value. Try using pandas, it support multiple data types in single column.
recarray, which allows field access by attribute on the array object, and record arrays also use a special datatype, numpy. record, which allows field access by attribute on the individual elements of the array. The simplest way to create a record array is with numpy.rec.array: >>> >>> recordarr = np.
Instead of using the STRING
dtype, one can always use object
as dtype. That will allow any object to be assigned to an array element, including Python variable length strings. For example:
>>> import numpy as np
>>> mydf = np.empty( (2,), dtype=[('file_name',object),('file_size_mb',float)] )
>>> mydf['file_name'][0]='foobarasdf.tif'
>>> mydf['file_name'][1]='arghtidlsarbda.jpg'
>>> mydf
array([('foobarasdf.tif', 0.0), ('arghtidlsarbda.jpg', 0.0)],
dtype=[('file_name', '|O8'), ('file_size_mb', '<f8')])
It is a against the spirit of the array concept to have variable length elements, but this is as close as one can get. The idea of an array is that elements are stored in memory at well-defined and regularly spaced memory addresses, which prohibits variable length elements. By storing the pointers to a string in an array, one can circumvent this limitation. (This is basically what the above example does.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With