Say I have data like d = [dict(animal='cat', weight=5), dict(animal='dog', weight=20)]
(basically JSON, where all entries have consistent data types).
In Pandas you can make this a table with df = pandas.DataFrame(d)
-- is there something comparable for plain NumPy record arrays? np.rec.fromrecords(d)
doesn't seem to given me what I want.
Proposal from me (generally it's slightly improved hpaulj's answer):
dicts = [dict(animal='cat', weight=5), dict(animal='dog', weight=20)]
Creation od dtype
object:
dt_tuples = []
for key, value in dicts[0].items():
if not isinstance(value, str):
value_dtype = np.array([value]).dtype
else:
value_dtype = '|S{}'.format(max([len(d[key]) for d in dicts]))
dt_tuples.append((key, value_dtype))
dt = np.dtype(dt_tuples)
As you see there's a problem with string handling - we need to check it's maximum length to define dtype. This additional condition can be skipped if you do not have string values in your dict or if you're sure that all those values have exactly same length.
If you're looking for one-liner it would be something like this:
dt = np.dtype([(k, np.array([v]).dtype if not isinstance(v, str) else '|S{}'.format(max([len(d[k]) for d in dicts]))) for k, v in dicts[0].items()])
(still it's probably better to break it for readability).
Values list:
values = [tuple(d[name] for name in dt.names) for d in dicts]
Because we iterate over dt.names
we are sure that order of values is correct.
And, at the end, array creation:
a = np.array(values, dtype=dt)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With