I am new to Python so this may be pretty straightforward, but I have not been able to find a good answer for my problem after looking for a while. I am trying to create a Pandas dataframe from a list of dictionaries.
My list of nested dictionaries is the following:
my_list = [{0: {'a': '23', 'b': '15', 'c': '5', 'd': '-1'},
1: {'a': '5', 'b': '6', 'c': '7', 'd': '9'},
2: {'a': '9', 'b': '15', 'c': '5', 'd': '7'}},
{0: {'a': '5', 'b': '249', 'c': '92', 'd': '-4'},
1: {'a': '51', 'b': '5', 'c': '34', 'd': '1'},
2: {'a': '3', 'b': '8', 'c': '3', 'd': '11'}}]
So each key in the main dictionaries has 3 values.
Putting these into a dataframe using data = pd.DataFrame(my_list)
returns something unusable, as each cell has information on a, b, c and d in it.
I want to end up with a dataframe that looks like this:
name| a | b | c | d
0 | 23 | 15 | 5 | -1
1 | 5 | 6 | 7 | 9
2 | 9 | 15 | 5 | 7
0 | 5 |249 | 92| -4
1 |51 | 5 | 34| 1
2 | 3 | 8 | 3 | 11
Is this possible?
DataFrame is a two-dimensional pandas data structure, which is used to represent the tabular data in the rows and columns format. We can create a pandas DataFrame object by using the python list of dictionaries.
Practical Data Science using PythonWe first take the list of nested dictionary and extract the rows of data from it. Then we create another for loop to append the rows into the new list which was originally created empty. Finally we apply the DataFrames function in the pandas library to create the Data Frame.
Pandas DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. It is generally the most commonly used pandas object. Pandas DataFrame can be created in multiple ways using Python. Let's discuss how to create a Pandas DataFrame from the List of Dictionaries.
Easy:
pd.concat([pd.DataFrame(l) for l in my_list],axis=1).T
Another solution:
from itertools import chain
pd.DataFrame.from_items(list(chain.from_iterable(d.iteritems() for d in my_list))).T
In my experiments, this is faster than using pd.concat
(especially when the number of "sub-dataframes" is large) at the cost of being more verbose.
You can munge the list of dictionaries to be acceptable to a DataFrame constructor:
In [4]: pd.DataFrame.from_records([{'name': k, **v} for d in my_list for k,v in d.items()])
Out[4]:
a b c d name
0 23 15 5 -1 0
1 5 6 7 9 1
2 9 15 5 7 2
3 5 249 92 -4 0
4 51 5 34 1 1
5 3 8 3 11 2
In [5]: df = pd.DataFrame.from_records([{'name': k, **v} for d in my_list for k,v in d.items()])
In [6]: df.set_index('name',inplace=True)
In [7]: df
Out[7]:
a b c d
name
0 23 15 5 -1
1 5 6 7 9
2 9 15 5 7
0 5 249 92 -4
1 51 5 34 1
2 3 8 3 11
This requires relatively recent versions of Python for {'name':'something', **rest}
to work. It is merely a shorthand for the following:
In [13]: reshaped = []
...: for d in my_list:
...: for k, v in d.items():
...: new = {'name': k}
...: new.update(v)
...: reshaped.append(new)
...:
In [14]: reshaped
Out[14]:
[{'a': '23', 'b': '15', 'c': '5', 'd': '-1', 'name': 0},
{'a': '5', 'b': '6', 'c': '7', 'd': '9', 'name': 1},
{'a': '9', 'b': '15', 'c': '5', 'd': '7', 'name': 2},
{'a': '5', 'b': '249', 'c': '92', 'd': '-4', 'name': 0},
{'a': '51', 'b': '5', 'c': '34', 'd': '1', 'name': 1},
{'a': '3', 'b': '8', 'c': '3', 'd': '11', 'name': 2}]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With