I have a list of dictionaries like this: <pre class="prettyprint"><code>[{'points': 50, 'time': '5:00', 'year': 2010}, {'points': 25, 'time': '6:00', 'month': "february"}, {'points':90, 'time': '9:00', 'month': 'january'}, {'points_h1':20, 'month': 'june'}] </code></pre> And I want to turn this into a pandas <code>DataFrame</code> like this: <pre class="prettyprint"><code> month points points_h1 time year 0 NaN 50 NaN 5:00 2010 1 february 25 NaN 6:00 NaN 2 january 90 NaN 9:00 NaN 3 june NaN 20 NaN NaN </code></pre> Note: Order of the columns does not matter. How can I turn the list of dictionaries into a pandas DataFrame as shown above?

Supposing <code>d</code> is your list of dicts, simply: <pre class="prettyprint"><code>df = pd.DataFrame(d) </code></pre> Note: this does not work with nested data.

Convert list of dictionaries to a pandas DataFrame

Tags:

dataframe

I have a list of dictionaries like this:

[{'points': 50, 'time': '5:00', 'year': 2010},  {'points': 25, 'time': '6:00', 'month': "february"},  {'points':90, 'time': '9:00', 'month': 'january'},  {'points_h1':20, 'month': 'june'}]

And I want to turn this into a pandas DataFrame like this:

      month  points  points_h1  time  year 0       NaN      50        NaN  5:00  2010 1  february      25        NaN  6:00   NaN 2   january      90        NaN  9:00   NaN 3      june     NaN         20   NaN   NaN

Note: Order of the columns does not matter.

How can I turn the list of dictionaries into a pandas DataFrame as shown above?

822

asked Dec 17 '13 15:12

2 Answers

Supposing d is your list of dicts, simply:

df = pd.DataFrame(d)

Note: this does not work with nested data.

answered Oct 15 '22 08:10

joris

How do I convert a list of dictionaries to a pandas DataFrame?

The other answers are correct, but not much has been explained in terms of advantages and limitations of these methods. The aim of this post will be to show examples of these methods under different situations, discuss when to use (and when not to use), and suggest alternatives.

`DataFrame()`, `DataFrame.from_records()`, and `.from_dict()`

Depending on the structure and format of your data, there are situations where either all three methods work, or some work better than others, or some don't work at all.

Consider a very contrived example.

np.random.seed(0) data = pd.DataFrame(     np.random.choice(10, (3, 4)), columns=list('ABCD')).to_dict('r')  print(data) [{'A': 5, 'B': 0, 'C': 3, 'D': 3},  {'A': 7, 'B': 9, 'C': 3, 'D': 5},  {'A': 2, 'B': 4, 'C': 7, 'D': 6}]

This list consists of "records" with every keys present. This is the simplest case you could encounter.

# The following methods all produce the same output. pd.DataFrame(data) pd.DataFrame.from_dict(data) pd.DataFrame.from_records(data)     A  B  C  D 0  5  0  3  3 1  7  9  3  5 2  2  4  7  6

Word on Dictionary Orientations: `orient='index'`/`'columns'`

Before continuing, it is important to make the distinction between the different types of dictionary orientations, and support with pandas. There are two primary types: "columns", and "index".

orient='columns'
Dictionaries with the "columns" orientation will have their keys correspond to columns in the equivalent DataFrame.

For example, data above is in the "columns" orient.

data_c = [  {'A': 5, 'B': 0, 'C': 3, 'D': 3},  {'A': 7, 'B': 9, 'C': 3, 'D': 5},  {'A': 2, 'B': 4, 'C': 7, 'D': 6}]

pd.DataFrame.from_dict(data_c, orient='columns')     A  B  C  D 0  5  0  3  3 1  7  9  3  5 2  2  4  7  6

Note: If you are using pd.DataFrame.from_records, the orientation is assumed to be "columns" (you cannot specify otherwise), and the dictionaries will be loaded accordingly.

orient='index'
With this orient, keys are assumed to correspond to index values. This kind of data is best suited for pd.DataFrame.from_dict.

data_i ={  0: {'A': 5, 'B': 0, 'C': 3, 'D': 3},  1: {'A': 7, 'B': 9, 'C': 3, 'D': 5},  2: {'A': 2, 'B': 4, 'C': 7, 'D': 6}}

pd.DataFrame.from_dict(data_i, orient='index')     A  B  C  D 0  5  0  3  3 1  7  9  3  5 2  2  4  7  6

This case is not considered in the OP, but is still useful to know.

Setting Custom Index

If you need a custom index on the resultant DataFrame, you can set it using the index=... argument.

pd.DataFrame(data, index=['a', 'b', 'c']) # pd.DataFrame.from_records(data, index=['a', 'b', 'c'])     A  B  C  D a  5  0  3  3 b  7  9  3  5 c  2  4  7  6

This is not supported by pd.DataFrame.from_dict.

Dealing with Missing Keys/Columns

All methods work out-of-the-box when handling dictionaries with missing keys/column values. For example,

data2 = [      {'A': 5, 'C': 3, 'D': 3},      {'A': 7, 'B': 9, 'F': 5},      {'B': 4, 'C': 7, 'E': 6}]

# The methods below all produce the same output. pd.DataFrame(data2) pd.DataFrame.from_dict(data2) pd.DataFrame.from_records(data2)       A    B    C    D    E    F 0  5.0  NaN  3.0  3.0  NaN  NaN 1  7.0  9.0  NaN  NaN  NaN  5.0 2  NaN  4.0  7.0  NaN  6.0  NaN

Reading Subset of Columns

"What if I don't want to read in every single column"? You can easily specify this using the columns=... parameter.

For example, from the example dictionary of data2 above, if you wanted to read only columns "A', 'D', and 'F', you can do so by passing a list:

pd.DataFrame(data2, columns=['A', 'D', 'F']) # pd.DataFrame.from_records(data2, columns=['A', 'D', 'F'])       A    D    F 0  5.0  3.0  NaN 1  7.0  NaN  5.0 2  NaN  NaN  NaN

This is not supported by pd.DataFrame.from_dict with the default orient "columns".

pd.DataFrame.from_dict(data2, orient='columns', columns=['A', 'B'])

ValueError: cannot use columns parameter with orient='columns'

Reading Subset of Rows

Not supported by any of these methods directly. You will have to iterate over your data and perform a reverse delete in-place as you iterate. For example, to extract only the 0^th and 2^nd rows from data2 above, you can use:

rows_to_select = {0, 2} for i in reversed(range(len(data2))):     if i not in rows_to_select:         del data2[i]  pd.DataFrame(data2) # pd.DataFrame.from_dict(data2) # pd.DataFrame.from_records(data2)       A    B  C    D    E 0  5.0  NaN  3  3.0  NaN 1  NaN  4.0  7  NaN  6.0

The Panacea: `json_normalize` for Nested Data

A strong, robust alternative to the methods outlined above is the json_normalize function which works with lists of dictionaries (records), and in addition can also handle nested dictionaries.

pd.json_normalize(data)     A  B  C  D 0  5  0  3  3 1  7  9  3  5 2  2  4  7  6

pd.json_normalize(data2)       A    B  C    D    E 0  5.0  NaN  3  3.0  NaN 1  NaN  4.0  7  NaN  6.0

Again, keep in mind that the data passed to json_normalize needs to be in the list-of-dictionaries (records) format.

As mentioned, json_normalize can also handle nested dictionaries. Here's an example taken from the documentation.

data_nested = [   {'counties': [{'name': 'Dade', 'population': 12345},                 {'name': 'Broward', 'population': 40000},                 {'name': 'Palm Beach', 'population': 60000}],    'info': {'governor': 'Rick Scott'},    'shortname': 'FL',    'state': 'Florida'},   {'counties': [{'name': 'Summit', 'population': 1234},                 {'name': 'Cuyahoga', 'population': 1337}],    'info': {'governor': 'John Kasich'},    'shortname': 'OH',    'state': 'Ohio'} ]

pd.json_normalize(data_nested,                            record_path='counties',                            meta=['state', 'shortname', ['info', 'governor']])           name  population    state shortname info.governor 0        Dade       12345  Florida        FL    Rick Scott 1     Broward       40000  Florida        FL    Rick Scott 2  Palm Beach       60000  Florida        FL    Rick Scott 3      Summit        1234     Ohio        OH   John Kasich 4    Cuyahoga        1337     Ohio        OH   John Kasich

For more information on the meta and record_path arguments, check out the documentation.

Summarising

Here's a table of all the methods discussed above, along with supported features/functionality.

enter image description here

_{* Use orient='columns' and then transpose to get the same effect as orient='index'.}

answered Oct 15 '22 07:10

cs95

Related questions
                            
                                How to use a decimal range() step value?
                            
                                How to check if type of a variable is string?
                            
                                What are the differences between the urllib, urllib2, urllib3 and requests module?
                            
                                pip install mysql-python fails with EnvironmentError: mysql_config not found
                            
                                How can I make a Python script standalone executable to run without ANY dependency? [duplicate]
                            
                                Why dict.get(key) instead of dict[key]?
                            
                                How to get an absolute file path in Python
                            
                                How to retrieve a module's path?
                            
                                How to make IPython notebook matplotlib plot inline
                            
                                What is the naming convention in Python for variable and function names?
                            
                                What __init__ and self do in Python?
                            
                                What is the difference between pip and conda?
                            
                                Python integer incrementing with ++ [duplicate]
                            
                                Find which version of package is installed with pip
                            
                                How to get a function name as a string?
                            
                                Pretty-print an entire Pandas Series / DataFrame
                            
                                Multiprocessing vs Threading Python [duplicate]
                            
                                mkdir -p functionality in Python [duplicate]
                            
                                How to remove items from a list while iterating?
                            
                                How to remove the first Item from a list?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Convert list of dictionaries to a pandas DataFrame

Tags:

python

dictionary

pandas

dataframe

appleLover

People also ask

2 Answers

joris

How do I convert a list of dictionaries to a pandas DataFrame?

`DataFrame()`, `DataFrame.from_records()`, and `.from_dict()`

Word on Dictionary Orientations: `orient='index'`/`'columns'`

Setting Custom Index

Dealing with Missing Keys/Columns

Reading Subset of Columns

Reading Subset of Rows

The Panacea: `json_normalize` for Nested Data

Summarising

cs95

Recent Activity

Donate For Us

Convert list of dictionaries to a pandas DataFrame

Tags:

python

dictionary

pandas

dataframe

appleLover

People also ask

2 Answers

joris

How do I convert a list of dictionaries to a pandas DataFrame?

DataFrame(), DataFrame.from_records(), and .from_dict()

Word on Dictionary Orientations: orient='index'/'columns'

Setting Custom Index

Dealing with Missing Keys/Columns

Reading Subset of Columns

Reading Subset of Rows

The Panacea: json_normalize for Nested Data

Summarising

cs95

Related questions

Recent Activity

Donate For Us

`DataFrame()`, `DataFrame.from_records()`, and `.from_dict()`

Word on Dictionary Orientations: `orient='index'`/`'columns'`

The Panacea: `json_normalize` for Nested Data