Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas DataFrame from list of lists of dicts

Tags:

python

pandas

I have a data structure that is a list of lists of dicts:

[
    [{'Height': 86, 'Left': 1385, 'Top': 215, 'Width': 86},
     {'Height': 87, 'Left': 865, 'Top': 266, 'Width': 87},
     {'Height': 103, 'Left': 271, 'Top': 506, 'Width': 103}],
    ...
]

I can convert it to a data frame:

detections[0:1]
df = pd.DataFrame(detections)
pd.DataFrame(df.apply(pd.Series).stack())

Which yields:

stacked data frame

This is almost what I want, but:

How would I turn the dictionary in each of the cells into a row with columns 'Left', 'Top', 'Width' 'Height'?

like image 970
fierval Avatar asked Feb 17 '17 17:02

fierval


2 Answers

To add to Psidom's answer, the list can also be flattened using itertools.chain.from_iterable.

from itertools import chain

pd.DataFrame(list(chain.from_iterable(detections)))

In my experiments this was about twice as fast for a large number of "chunks."

In [1]: %timeit [r for d in detections for r in d]
10000 loops, best of 3: 69.9 µs per loop

In [2]: %timeit list(chain.from_iterable(detections))
10000 loops, best of 3: 34 µs per loop

If you actually want the index in the final data frame to reflect the original grouping, you can accomplish this with

pd.DataFrame(detections).stack().apply(pd.Series)
       Height  Left  Top  Width
0   0      86  1385  215     86
    1      87   865  266     87
    2     103   271  506    103
1   0      86  1385  215     86
    1      87   865  266     87
    2     103   271  506    103

You were close, but you need to apply pd.Series after stacking the indices.

like image 65
Igor Raush Avatar answered Oct 29 '22 07:10

Igor Raush


You can loop through the list, construct a list of data frames and then concatenate them:

pd.concat([pd.DataFrame(d) for d in detections])

# Height    Left    Top  Width
#0    86    1385    215     86
#1    87     865    266     87
#2   103     271    506    103

Alternatively, flatten the list firstly and then call pd.DataFrame():

pd.DataFrame([r for d in detections for r in d])
like image 21
Psidom Avatar answered Oct 29 '22 07:10

Psidom