Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert a series of tuples into a pandas dataframe?

Assume that we have the following pandas series resulted from an apply function applied on a dataframe after groupby.

<class 'pandas.core.series.Series'>
0        (1, 0, [0.2, 0.2, 0.2], [0.2, 0.2, 0.2])
1     (2, 1000, [0.6, 0.7, 0.5], [0.1, 0.3, 0.1])
2        (1, 0, [0.4, 0.4, 0.4], [0.4, 0.4, 0.4])
3        (1, 0, [0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
4    (3, 14000, [0.8, 0.8, 0.8], [0.6, 0.6, 0.6])
dtype: object

Can we convert this into a dataframe when the sigList=['sig1','sig2', 'sig3'] are given?

Length Distance sig1Max sig2Max sig3Max sig1Min sig2Min sig3Min
1 0 0.2 0.2 0.2 0.2 0.2 0.2                  
2 1000 0.6 0.7 0.5 0.1 0.3 0.1
1 0 0.4 0.4 0.4 0.4 0.4 0.4
1 0 0.5 0.5 0.5 0.5 0.5 0.5
3 14000 0.8 0.8 0.8 0.6 0.6 0.6

Thanks in advance

like image 259
burcak Avatar asked Nov 20 '18 22:11

burcak


People also ask

How do you create a pandas DataFrame from a list of tuples?

We need to send this list of tuples as a parameter to the pandas. DataFrame() function. The Pandas DataFrame object will store the data in a tabular format, Here the tuple element of the list object will become the row of the resultant DataFrame.

How do you convert multiple series to DataFrame?

You can create a DataFrame from multiple Series objects by adding each series as a columns. By using concat() method you can merge multiple series together into DataFrame. This takes several params, for our scenario we use list that takes series to combine and axis=1 to specify merge series as columns instead of rows.


2 Answers

Do it the old fashioned (and fast) way, using a list comprehension:

columns = ("Length Distance sig1Max sig2Max" 
           "sig3Max sig1Min sig2Min sig3Min").split()
df = pd.DataFrame([[a, b, *c, *d] for a,b,c,d in series.values], columns=columns)
print(df)
   Length  Distance  sig1Max  sig2Max  sig3Max  sig1Min  sig2Min  sig3Min
0       1         0      0.2      0.2      0.2      0.2      0.2      0.2
1       2      1000      0.6      0.7      0.5      0.1      0.3      0.1
2       1         0      0.4      0.4      0.4      0.4      0.4      0.4
3       1         0      0.5      0.5      0.5      0.5      0.5      0.5
4       3     14000      0.8      0.8      0.8      0.6      0.6      0.6

Or, perhaps you meant, do it a little more dynamically

sigList = ['sig1', 'sig2', 'sig3']

columns = ['Length', 'Distance']
columns.extend(f'{s}{lbl}' for lbl in ('Max', 'Min') for s in sigList )

df = pd.DataFrame([[a,b,*c,*d] for a,b,c,d in series.values], columns=columns)
print(df)
   Length  Distance  sig1Max  sig2Max  sig3Max  sig1Min  sig2Min  sig3Min
0       1         0      0.2      0.2      0.2      0.2      0.2      0.2
1       2      1000      0.6      0.7      0.5      0.1      0.3      0.1
2       1         0      0.4      0.4      0.4      0.4      0.4      0.4
3       1         0      0.5      0.5      0.5      0.5      0.5      0.5
4       3     14000      0.8      0.8      0.8      0.6      0.6      0.6
like image 182
cs95 Avatar answered Oct 27 '22 01:10

cs95


You may check

newdf=pd.DataFrame(s.tolist())
newdf=pd.concat([newdf[[0,1]],pd.DataFrame(newdf[2].tolist()),pd.DataFrame(newdf[3].tolist())],1)
newdf.columns = [
    "Length", "Distance", "sig1Max", "sig2Max", "sig3Max", "sig1Min", "sig2Min", "sig3Min"
]
newdf
Out[163]: 
   Length  Distance  sig1Max   ...     sig1Min  sig2Min  sig3Min
0       1         0      0.2   ...         0.2      0.2      0.2
1       2      1000      0.6   ...         0.1      0.3      0.1
2       1         0      0.4   ...         0.4      0.4      0.4
3       1         0      0.5   ...         0.5      0.5      0.5
4       3     14000      0.8   ...         0.6      0.6      0.6
[5 rows x 8 columns]
like image 39
BENY Avatar answered Oct 26 '22 23:10

BENY