I have a dataframe in the following structure: <pre class="prettyprint"><code>cNames | cValues | number [a,b,c] | [1,2,3] | 10 [a,b,d] | [55,66,77]| 20 </code></pre> I would like to transpose - create columns from the names in cNames. But I can't manage to achieve this with transpose because I want a column for each value in the list. The needed output: <pre class="prettyprint"><code>a | b | c | d | number 1 | 2 | 3 | NaN | 10 55 | 66 | NaN | 77 | 20 </code></pre> How can I achieve this result? Thanks! The code to create the DF: <pre class="prettyprint"><code>d = {'cNames': [['a','b','c'], ['a','b','d']], 'cValues': [[1,2,3], [55,66,77]], 'number': [10,20]} df = pd.DataFrame(data=d) </code></pre>

You can concatenate <code>explode()</code> and then pivot the table back to desired output! <pre class="prettyprint"><code>df = df.explode('cNames').explode('cValues') df['cValues'] = pd.to_numeric(df['cValues']) print(df.pivot_table(columns='cNames',index='number',values='cValues')) </code></pre> Output: <pre class="prettyprint"><code>cNames a b c d number 10 2.0 2.0 2.0 NaN 20 66.0 66.0 NaN 66.0 </code></pre> Pitifully, the output of explode is of type <code>object</code> therefore, we must transform it first to <code>pd.to_numeric()</code> before pivoting. Otherwise there will no be numeric values to aggregate.

Transpose dataframe based on column list

Tags:

python

pandas

dataframe

I have a dataframe in the following structure:

cNames  | cValues   |  number  
[a,b,c] | [1,2,3]   |  10      
[a,b,d] | [55,66,77]|  20

I would like to transpose - create columns from the names in cNames.
But I can't manage to achieve this with transpose because I want a column for each value in the list.
The needed output:

a   | b   | c   | d   |  number
1   | 2   | 3   | NaN | 10
55  | 66  | NaN | 77  | 20

How can I achieve this result?
Thanks!

The code to create the DF:

d = {'cNames': [['a','b','c'], ['a','b','d']], 'cValues': [[1,2,3], 
[55,66,77]], 'number': [10,20]}
df = pd.DataFrame(data=d)

429

asked Feb 05 '21 21:02

Dave

3 Answers

One option is concat:

pd.concat([pd.Series(x['cValues'], x['cNames'], name=idx) 
               for idx, x in df.iterrows()], 
          axis=1
         ).T.join(df.iloc[:,2:])

Or a DataFrame construction:

pd.DataFrame({idx: dict(zip(x['cNames'], x['cValues']) )
              for idx, x in df.iterrows()
            }).T.join(df.iloc[:,2:])

Output:

      a     b    c     d  number
0   1.0   2.0  3.0   NaN      10
1  55.0  66.0  NaN  77.0      20

Update Performances sort by run time on sample data

DataFrame

%%timeit
pd.DataFrame({idx: dict(zip(x['cNames'], x['cValues']) )
              for idx, x in df.iterrows()
            }).T.join(df.iloc[:,2:])
1.29 ms ± 36.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

concat:

%%timeit
pd.concat([pd.Series(x['cValues'], x['cNames'], name=idx) 
               for idx, x in df.iterrows()], 
          axis=1
         ).T.join(df.iloc[:,2:])
2.03 ms ± 86.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

KJDII's new series

%%timeit
df['series'] = df.apply(lambda x: dict(zip(x['cNames'], x['cValues'])), axis=1)
pd.concat([df['number'], df['series'].apply(pd.Series)], axis=1)

2.09 ms ± 65.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Scott's apply(pd.Series.explode)

%%timeit
df.apply(pd.Series.explode)\
  .set_index(['number', 'cNames'], append=True)['cValues']\
  .unstack()\
  .reset_index()\
  .drop('level_0', axis=1)

4.9 ms ± 135 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

wwnde's set_index.apply(explode)

%%timeit
g=df.set_index('number').apply(lambda x: x.explode()).reset_index()
g['cValues']=g['cValues'].astype(int)
pd.pivot_table(g, index=["number"],values=["cValues"],columns=["cNames"]).droplevel(0, axis=1).reset_index()

7.27 ms ± 162 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Celius' double explode

%%timeit
df1 = df.explode('cNames').explode('cValues')
df1['cValues'] = pd.to_numeric(df1['cValues'])
df1.pivot_table(columns='cNames',index='number',values='cValues')

9.42 ms ± 189 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

104

answered Oct 19 '22 10:10

Quang Hoang

You can concatenate explode() and then pivot the table back to desired output!

df = df.explode('cNames').explode('cValues')
df['cValues'] = pd.to_numeric(df['cValues'])
print(df.pivot_table(columns='cNames',index='number',values='cValues'))

Output:

cNames     a     b    c     d
number                       
10       2.0   2.0  2.0   NaN
20      66.0  66.0  NaN  66.0

Pitifully, the output of explode is of type object therefore, we must transform it first to pd.to_numeric() before pivoting. Otherwise there will no be numeric values to aggregate.

answered Oct 19 '22 08:10

Celius Stingher


import pandas as pd

d = {'cNames': [['a','b','c'], ['a','b','d']], 'cValues': [[1,2,3], 
[55,66,77]], 'number': [10,20]}
df = pd.DataFrame(data=d)

df['series'] = df.apply(lambda x: dict(zip(x['cNames'], x['cValues'])), axis=1)
df = pd.concat([df['number'], df['series'].apply(pd.Series)], axis=1)
print(df)

   number     a     b    c     d
0      10   1.0   2.0  3.0   NaN
1      20  55.0  66.0  NaN  77.0

if column order matters:


columns = ['a', 'b', 'c', 'd', 'number']
df = df[columns]

      a     b    c     d  number
0   1.0   2.0  3.0   NaN      10
1  55.0  66.0  NaN  77.0      20

answered Oct 19 '22 09:10

KJDII

Related questions
                            
                                Flask-sqlalchemy query datetime intervals
                            
                                Object has no attribute '__getitem__' (class instance?)
                            
                                Alternative to python's .sort() (for inserting into a large list and keeping it sorted)
                            
                                Comparing values in two lists in Python
                            
                                Compare PandaS DataFrames and return rows that are missing from the first one
                            
                                Convert list to string using python
                            
                                python check if year is in string [duplicate]
                            
                                Converting date to string in Python [duplicate]
                            
                                What is the difference between Anaconda and Pycharm? [closed]
                            
                                How to remove multiple columns that end with same text in Pandas?
                            
                                How can I hide columns in Openpyxl?
                            
                                Convert nested JSON to CSV file in Python
                            
                                Virtual Environment for Python Django
                            
                                Python: average distance between a bunch of points in the (x,y) plane
                            
                                Tensorboard Error 'Can not convert a AdamOptimizer into a Tensor or Operation.'
                            
                                If X or Y or Z then use *that* one?
                            
                                pyuic5 - ModuleNotFoundError: No module named PyQt5.sip
                            
                                List of Series to Dataframe
                            
                                Tensorflow compatibility with Keras
                            
                                Plotly: How to add volume to a candlestick chart

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With