Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Easiest way to create a NumPy record array from a list of dictionaries?

Tags:

python

numpy

Say I have data like d = [dict(animal='cat', weight=5), dict(animal='dog', weight=20)] (basically JSON, where all entries have consistent data types).

In Pandas you can make this a table with df = pandas.DataFrame(d) -- is there something comparable for plain NumPy record arrays? np.rec.fromrecords(d) doesn't seem to given me what I want.

like image 419
Roger Avatar asked Jul 16 '14 23:07

Roger


1 Answers

Proposal from me (generally it's slightly improved hpaulj's answer):

dicts = [dict(animal='cat', weight=5), dict(animal='dog', weight=20)]

Creation od dtype object:

dt_tuples = []
for key, value in dicts[0].items():
    if not isinstance(value, str):
        value_dtype = np.array([value]).dtype
    else:
        value_dtype = '|S{}'.format(max([len(d[key]) for d in dicts]))
    dt_tuples.append((key, value_dtype))
dt = np.dtype(dt_tuples)

As you see there's a problem with string handling - we need to check it's maximum length to define dtype. This additional condition can be skipped if you do not have string values in your dict or if you're sure that all those values have exactly same length.

If you're looking for one-liner it would be something like this:

dt = np.dtype([(k, np.array([v]).dtype if not isinstance(v, str) else '|S{}'.format(max([len(d[k]) for d in dicts]))) for k, v in dicts[0].items()])

(still it's probably better to break it for readability).

Values list:

values = [tuple(d[name] for name in dt.names) for d in dicts]

Because we iterate over dt.names we are sure that order of values is correct.

And, at the end, array creation:

a = np.array(values, dtype=dt)

like image 89
Zuku Avatar answered Sep 22 '22 07:09

Zuku