I'd like to efficiently create a pandas DataFrame from a Python collections.Counter dictionary .. but there's an additional requirement.
The Counter dictionary looks like this:
(a, b) : 5
(c, d) : 7
(a, d) : 2
Those dictionary keys are tuples where the first is to become the row, and the second the column of the dataframe.
The resulting DataFrame should look like this:
b d
a 5 2
c 0 7
For larger data I don't want to create a dataframe using the growth method df[a][b]= 5
etc as that is incredibly inefficient as it creates a copy of the new dataframe every time such an extension is done (I'm let to believe).
Perhaps the right answer is to go via a numpy array?
So we can use strings, numbers (int or float), or tuples as keys. Values can be of any type. We can also pass a dictionary to the dataframe function. The keys will be column names and the values will form the columns.
We first take the list of nested dictionary and extract the rows of data from it. Then we create another for loop to append the rows into the new list which was originally created empty. Finally we apply the DataFrames function in the pandas library to create the Data Frame.
pandas MultiIndex to ColumnsUse pandas DataFrame. reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero.
It is the most commonly used pandas object. Creating pandas data-frame from lists using dictionary can be achieved in multiple ways. Let's discuss different ways to create a DataFrame one by one. With this method in Pandas, we can transform a dictionary of lists into a dataframe.
Using Series
with unstack
pd.Series(d).unstack(fill_value=0)
Out[708]:
b d
a 5 2
c 0 7
Input data
d={('a', 'b') : 5,
('c', 'd') : 7,
('a', 'd') : 2}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With