Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to unpack a Series of tuples in Pandas?

Tags:

python

pandas

Sometimes I end up with a series of tuples/lists when using Pandas. This is common when, for example, doing a group-by and passing a function that has multiple return values:

import numpy as np from scipy import stats df = pd.DataFrame(dict(x=np.random.randn(100),                        y=np.repeat(list("abcd"), 25))) out = df.groupby("y").x.apply(stats.ttest_1samp, 0) print out  y a       (1.3066417476, 0.203717485506) b    (0.0801133382517, 0.936811414675) c      (1.55784329113, 0.132360504653) d     (0.267999459642, 0.790989680709) dtype: object 

What is the correct way to "unpack" this structure so that I get a DataFrame with two columns?

A related question is how I can unpack either this structure or the resulting dataframe into two Series/array objects. This almost works:

t, p = zip(*out) 

but it t is

 (array(1.3066417475999257),  array(0.08011333825171714),  array(1.557843291126335),  array(0.267999459641651)) 

and one needs to take the extra step of squeezing it.

like image 556
mwaskom Avatar asked Apr 02 '14 00:04

mwaskom


People also ask

How do you split tuples in pandas?

To split a column of tuples in a Python Pandas data frame, we can use the column's tolist method. We create the df data frame with the pd. DataFrame class and a dictionary. Then we create a new data frame from df by using df['b'].

How do you separate the Panda series?

split() Pandas provide a method to split string around a passed separator/delimiter. After that, the string can be stored as a list in a series or it can also be used to create multiple column data frames from a single separated string.

How do you get items from pandas series?

Accessing Element from Series with Position In order to access the series element refers to the index number. Use the index operator [ ] to access an element in a series. The index must be an integer. In order to access multiple elements from a series, we use Slice operation.


2 Answers

maybe this is most strightforward (most pythonic i guess):

out.apply(pd.Series) 

if you would want to rename the columns to something more meaningful, than:

out.columns=['Kstats','Pvalue'] 

if you do not want the default name for the index:

out.index.name=None 
like image 80
Siraj S. Avatar answered Sep 19 '22 14:09

Siraj S.


maybe:

>>> pd.DataFrame(out.tolist(), columns=['out-1','out-2'], index=out.index)                   out-1     out-2 y                                 a   -1.9153853424536496  0.067433 b     1.277561889173181  0.213624 c  0.062021492729736116  0.951059 d    0.3036745009819999  0.763993  [4 rows x 2 columns] 
like image 37
behzad.nouri Avatar answered Sep 16 '22 14:09

behzad.nouri