I'd like to efficiently create a pandas DataFrame from a Python collections.Counter dictionary .. but there's an additional requirement. The Counter dictionary looks like this: <pre class="prettyprint"><code>(a, b) : 5 (c, d) : 7 (a, d) : 2 </code></pre> Those dictionary keys are tuples where the first is to become the row, and the second the column of the dataframe. The resulting DataFrame should look like this: <pre class="prettyprint"><code> b d a 5 2 c 0 7 </code></pre> For larger data I don't want to create a dataframe using the growth method <code>df[a][b]= 5</code> etc as that is incredibly inefficient as it creates a copy of the new dataframe every time such an extension is done (I'm let to believe). Perhaps the right answer is to go via a numpy array?

Using <code>Series</code> with <code>unstack</code> <pre class="prettyprint"><code>pd.Series(d).unstack(fill_value=0) Out[708]: b d a 5 2 c 0 7 </code></pre> Input data <pre class="prettyprint"><code>d={('a', 'b') : 5, ('c', 'd') : 7, ('a', 'd') : 2} </code></pre>

Create MultiIndex pandas DataFrame from dictionary with tuple keys

Tags:

python

dictionary

pandas

multi-index

I'd like to efficiently create a pandas DataFrame from a Python collections.Counter dictionary .. but there's an additional requirement.

The Counter dictionary looks like this:

(a, b) : 5
(c, d) : 7
(a, d) : 2

Those dictionary keys are tuples where the first is to become the row, and the second the column of the dataframe.

The resulting DataFrame should look like this:

   b  d
a  5  2
c  0  7

For larger data I don't want to create a dataframe using the growth method df[a][b]= 5 etc as that is incredibly inefficient as it creates a copy of the new dataframe every time such an extension is done (I'm let to believe).

Perhaps the right answer is to go via a numpy array?

257

asked Jan 18 '19 17:01

Intuitive Text Mining

1 Answers

Using Series with unstack

pd.Series(d).unstack(fill_value=0)
Out[708]: 
   b  d
a  5  2
c  0  7

Input data

d={('a', 'b') : 5,
('c', 'd') : 7,
('a', 'd') : 2}

155

answered Oct 05 '22 17:10

BENY

Related questions
                            
                                How do I disable geckodriver's log on Selenium (Python 3)?
                            
                                Why is a NumPy int not an instance of a Python int, but a NumPy float is an instance of a Python float?
                            
                                werkzeug.routing.BuildError: Could not build url for endpoint
                            
                                How to extract metadata of video files using Python 3.7? [closed]
                            
                                how to remove a row which has empty column in a dataframe using pandas
                            
                                How to plot SVM decision boundary in sklearn Python?
                            
                                Is distributing python source code in Docker secure?
                            
                                Error "Unable to open Jupyter Notebook: Port 8888 is already in use"
                            
                                Understanding the "left_index" and "right_index" arguments in pandas merge
                            
                                python requests - encoding with 'idna' codec failed (UnicodeError: label empty or too long) error
                            
                                Python: Cosine similarity between two large numpy arrays
                            
                                Get filename after a CTRL+C on a file with Windows Explorer
                            
                                How can I plot 2d FEM results using matplotlib?
                            
                                Python docker-compose interpreter in Pycharm: Couldn't find docker binary
                            
                                How to get ISO8601 string for datetime with milliseconds instead of microseconds in python 3.5
                            
                                RabbitMQ pika.exceptions.ConnectionClosed (-1, "error(104, 'Connection reset by peer')")
                            
                                Dataclass subclass does not inherit __repr__
                            
                                Fundamental understanding of tvecs rvecs in OpenCV-ArUco
                            
                                Unknown string format on pd.to_datetime
                            
                                Django DateTimeField says 'You are 5.5 hours ahead of server time.'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With