Sometimes I end up with a series of tuples/lists when using Pandas. This is common when, for example, doing a group-by and passing a function that has multiple return values: <pre class="prettyprint"><code>import numpy as np from scipy import stats df = pd.DataFrame(dict(x=np.random.randn(100), y=np.repeat(list("abcd"), 25))) out = df.groupby("y").x.apply(stats.ttest_1samp, 0) print out y a (1.3066417476, 0.203717485506) b (0.0801133382517, 0.936811414675) c (1.55784329113, 0.132360504653) d (0.267999459642, 0.790989680709) dtype: object </code></pre> What is the correct way to "unpack" this structure so that I get a DataFrame with two columns? A related question is how I can unpack either this structure or the resulting dataframe into two Series/array objects. This almost works: <pre class="prettyprint"><code>t, p = zip(*out) </code></pre> but it <code>t</code> is <pre class="prettyprint"><code> (array(1.3066417475999257), array(0.08011333825171714), array(1.557843291126335), array(0.267999459641651)) </code></pre> and one needs to take the extra step of squeezing it.

maybe this is most strightforward (most pythonic i guess): <pre class="prettyprint"><code>out.apply(pd.Series) </code></pre> if you would want to rename the columns to something more meaningful, than: <pre class="prettyprint"><code>out.columns=['Kstats','Pvalue'] </code></pre> if you do not want the default name for the index: <pre class="prettyprint"><code>out.index.name=None </code></pre>

How to unpack a Series of tuples in Pandas?

Tags:

python

pandas

Sometimes I end up with a series of tuples/lists when using Pandas. This is common when, for example, doing a group-by and passing a function that has multiple return values:

import numpy as np from scipy import stats df = pd.DataFrame(dict(x=np.random.randn(100),                        y=np.repeat(list("abcd"), 25))) out = df.groupby("y").x.apply(stats.ttest_1samp, 0) print out  y a       (1.3066417476, 0.203717485506) b    (0.0801133382517, 0.936811414675) c      (1.55784329113, 0.132360504653) d     (0.267999459642, 0.790989680709) dtype: object

What is the correct way to "unpack" this structure so that I get a DataFrame with two columns?

A related question is how I can unpack either this structure or the resulting dataframe into two Series/array objects. This almost works:

t, p = zip(*out)

but it t is

 (array(1.3066417475999257),  array(0.08011333825171714),  array(1.557843291126335),  array(0.267999459641651))

and one needs to take the extra step of squeezing it.

556

asked Apr 02 '14 00:04

mwaskom

2 Answers

maybe this is most strightforward (most pythonic i guess):

out.apply(pd.Series)

if you would want to rename the columns to something more meaningful, than:

out.columns=['Kstats','Pvalue']

if you do not want the default name for the index:

out.index.name=None

answered Sep 19 '22 14:09

Siraj S.

maybe:

>>> pd.DataFrame(out.tolist(), columns=['out-1','out-2'], index=out.index)                   out-1     out-2 y                                 a   -1.9153853424536496  0.067433 b     1.277561889173181  0.213624 c  0.062021492729736116  0.951059 d    0.3036745009819999  0.763993  [4 rows x 2 columns]

answered Sep 16 '22 14:09

behzad.nouri

Related questions
                            
                                rethrowing python exception. Which to catch?
                            
                                iterate over pandas dataframe using itertuples
                            
                                How to do PGP in Python (generate keys, encrypt/decrypt)
                            
                                python: create list of tuples from lists [duplicate]
                            
                                numpy: multiply arrays rowwise
                            
                                Plot histogram with colors taken from colormap
                            
                                Can I avoid circular imports in Flask and SQLAlchemy
                            
                                scipy csr_matrix: understand indptr
                            
                                How to change the file name of an uploaded file in Django?
                            
                                How to do string formatting with placeholders in Java (like in Python)?
                            
                                Call function based on argparse
                            
                                Get the number of all keys in a dictionary of dictionaries in Python
                            
                                How to set max output width in numpy?
                            
                                Difference between 'related_name' and 'related_query_name' attributes in Django?
                            
                                What's the simplest cross-platform way to pop up graphical dialogs in Python?
                            
                                Running python script from inside virtualenv bin is not working
                            
                                Dynamically add/create subplots in matplotlib
                            
                                Setting a value in a nested Python dictionary given a list of indices and value
                            
                                How can I get argparse in Python 2.6?
                            
                                raise LinAlgError("SVD did not converge") LinAlgError: SVD did not converge in matplotlib pca determination

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With