ESRI gives access to moving data from tables to arrays and back. I have a script that takes census data from an api call and converts it into arrays, does some simple math, and then, ideally, puts it out into a table. To do the math, the array cannot be a rec array. No combination of vstack, hstack, or concatenate seemed to give a good result. I resorted to creating individual 1-d arrays as recarrays, and then using the merge function in np.lib.recfunctions.merge_arrays. surely there is a better way.
ESRI's return from TableToNumPyArray:
>>> testArray
array([ (41039000100.0, 2628.0, 100.0, 2339.0, 135.0, 18.0, 22.0, 16.0, 25.0, 0.0, 92.0, 0.0, 92.0, 0.0, 92.0, 0.0, 92.0, 6.0, 9.0, 249.0, 90.0, 0.0, 92.0, 1, u'41039000100'),
...
dtype=[('Geo_id', '<f8'), ('TotalUnits', '<f8'), ('MOE_Total', '<f8'), >('Total_1_detached', '<f8'), ('MOE_Total_1_detached', '<f8'), ('Total_1_attached', >'<f8'), ('MOE_Total_1_attached', '<f8'), ('Total_2', '<f8'), ('MOE_Total_2', '<f8'), >('Total_3_or_4', '<f8'), ('MOE_Total_3_or_4', '<f8'), ('Total_5_to_9', '<f8'), >('MOE_Total_5_to_9', '<f8'), ('Total_10_to_19', '<f8'), ('MOE_Total_10_to_19', '<f8'), >('Total_20_to_49', '<f8'), ('MOE_Total_20_to_49', '<f8'), ('Total_50_or_more', '<f8'), >('MOE_Total_50_or_more', '<f8'), ('Total_Mobile_home', '<f8'), ('MOE_Total_Mobile_home', '<f8'), ('Total_Boat_RV_van_etc', '<f8'), ('MOE_Total_Boat_RV_van_etc', '<f8'), >('ObjectID', '<i4'), ('geo_id_t', '<U50')])
My snippet of code looks like
try:
# Assign Geo_id array
Geo_id_array = B25008_001E_array[...,0]
Tpop_array = B25008_001E_array[...,1]
Tunits_array = B25024_001E_array[...,1]
# divide by sero is possible for real rowns and definite for the end-of-file
# tract, so convert nan's in the HHsize_array to zero's with nan_to_num
# HHsize_array = Tpop_array.view(np.float32)/Tunits_array.view(np.float32)
HHsize_array = Tpop_array/Tunits_array
HHsize_array = nan_to_num(HHsize_array)
# Table_array = array(vstack((Geo_id_array, Tpop_array, Tunits_array, HHsize_array)), dtype = ([('Geo_id', '|S13'), ('Tpop', np.int32), ('Tunits_array', np.int32), ('HHsize', np.float32)]))
# Table_array = np.hstack((Geo_id_array, Tpop_array, Tunits_array, HHsize_array))
Geo_id_recarray = np.array(Geo_id_array, dtype = ([('Geo_id', '|S13')]))
Tpop_recarray = np.array(Tpop_array, dtype = ([('Tpop', np.int32)]))
Tunits_recarray = np.array(Tunits_array, dtype = ([('Tunits_array', np.int32)]))
HHsize_recarray = np.array(HHsize_array, dtype = ([('HHsize', np.float32)]))
arrays = [Geo_id_recarray, Tpop_recarray, Tunits_recarray, HHsize_recarray]
MergedArray = np.lib.recfunctions.merge_arrays(arrays, usemask=False)
print
print
except Exception as e:
# If an error occurred, print line number and error message
import traceback, sys
tb = sys.exc_info()[2]
print "An error occured on line %i" % tb.tb_lineno
print str(e)
I'd prefer to merge/join/stack the arrays before structuring them, I think. Thoughts?
You should be able to use the structured arrays (technically you're not using recarrays) to do the "simple math." I'm not sure if you're showing the math you'd like to do, but for example if you want to do:
HHsize_array = Tpop_array/Tunits_array
But don't want to have all those separate arrays, you could simply do the math on views of the main (merged array), let's call it data:
data['HHsize'] = data['Tpop']/data['Tunits']
where HHsize, Tpop, and Tunits are all field names in one structured array called data, such that you'd have
>>> data.dtype
dtype([('Geo_id', '|S13'), ('Tpop', np.int32), ('Tunits_array', np.int32), ('HHsize', np.float32)])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With