Easiest way to create a NumPy record array from a list of dictionaries?

Question

Say I have data like d = [dict(animal='cat', weight=5), dict(animal='dog', weight=20)] (basically JSON, where all entries have consistent data types).

In Pandas you can make this a table with df = pandas.DataFrame(d) -- is there something comparable for plain NumPy record arrays? np.rec.fromrecords(d) doesn't seem to given me what I want.

Zuku · Accepted Answer

Proposal from me (generally it's slightly improved hpaulj's answer):

dicts = [dict(animal='cat', weight=5), dict(animal='dog', weight=20)]

Creation od dtype object:

dt_tuples = []
for key, value in dicts[0].items():
    if not isinstance(value, str):
        value_dtype = np.array([value]).dtype
    else:
        value_dtype = '|S{}'.format(max([len(d[key]) for d in dicts]))
    dt_tuples.append((key, value_dtype))
dt = np.dtype(dt_tuples)

As you see there's a problem with string handling - we need to check it's maximum length to define dtype. This additional condition can be skipped if you do not have string values in your dict or if you're sure that all those values have exactly same length.

If you're looking for one-liner it would be something like this:

dt = np.dtype([(k, np.array([v]).dtype if not isinstance(v, str) else '|S{}'.format(max([len(d[k]) for d in dicts]))) for k, v in dicts[0].items()])

(still it's probably better to break it for readability).

Values list:

values = [tuple(d[name] for name in dt.names) for d in dicts]

Because we iterate over dt.names we are sure that order of values is correct.

And, at the end, array creation:

a = np.array(values, dtype=dt)

Easiest way to create a NumPy record array from a list of dictionaries?

Tags:

python

numpy

Roger

1 Answers

Zuku

Recent Activity

Donate For Us

Easiest way to create a NumPy record array from a list of dictionaries?

Tags:

python

numpy

Roger

1 Answers

Zuku

Related questions

Recent Activity

Donate For Us