Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transpose dataframe based on column list

I have a dataframe in the following structure:

cNames  | cValues   |  number  
[a,b,c] | [1,2,3]   |  10      
[a,b,d] | [55,66,77]|  20

I would like to transpose - create columns from the names in cNames.
But I can't manage to achieve this with transpose because I want a column for each value in the list.
The needed output:

a   | b   | c   | d   |  number
1   | 2   | 3   | NaN | 10
55  | 66  | NaN | 77  | 20

How can I achieve this result?
Thanks!

The code to create the DF:

d = {'cNames': [['a','b','c'], ['a','b','d']], 'cValues': [[1,2,3], 
[55,66,77]], 'number': [10,20]}
df = pd.DataFrame(data=d)
like image 429
Dave Avatar asked Feb 05 '21 21:02

Dave


People also ask

How do I transpose certain columns in pandas?

Pandas DataFrame. transpose() is a library function that transpose index and columns. The transpose reflects the DataFrame over its main diagonal by writing rows as columns and vice-versa. Use the T attribute or the transpose() method to swap (= transpose) the rows and columns of DataFrame.

How do you transpose a list in Python?

Transpose with built-in function zip() You can transpose a two-dimensional list using the built-in function zip() . zip() is a function that returns an iterator that summarizes the multiple iterables ( list , tuple , etc.). In addition, use * that allows you to unpack the list and pass its elements to the function.

How do I transpose a pandas DataFrame?

Pandas DataFrame: transpose() functionThe transpose() function is used to transpose index and columns. Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. If True, the underlying data is copied. Otherwise (default), no copy is made if possible.

Can you transpose a data frame?

transpose() function transpose index and columns of the dataframe. It reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa.

How do you transpose data in a Dataframe?

Transpose index and columns. Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. The property T is an accessor to the method transpose (). If True, the underlying data is copied. Otherwise (default), no copy is made if possible.

How to transpose pandas Dataframe with default index?

You can use the following syntax to transpose Pandas DataFrame: df = df.transpose () Let’s see how to apply the above syntax by reviewing 3 cases of: Transposing a DataFrame with a default index

How do you transpose a column in a table?

(1.) Click the column name that you want to transpose data based on, and select Primary Key; (2.) Click another column that you want to transpose, and click Combine then choose one separator to separate the combined data, such as space, comma, semicolon.

What is Python pandas transpose?

Python | Pandas DataFrame.transpose. Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. It can be thought of as a dict-like container for Series objects. This is the primary data structure of the Pandas.


3 Answers

One option is concat:

pd.concat([pd.Series(x['cValues'], x['cNames'], name=idx) 
               for idx, x in df.iterrows()], 
          axis=1
         ).T.join(df.iloc[:,2:])

Or a DataFrame construction:

pd.DataFrame({idx: dict(zip(x['cNames'], x['cValues']) )
              for idx, x in df.iterrows()
            }).T.join(df.iloc[:,2:])

Output:

      a     b    c     d  number
0   1.0   2.0  3.0   NaN      10
1  55.0  66.0  NaN  77.0      20

Update Performances sort by run time on sample data

DataFrame

%%timeit
pd.DataFrame({idx: dict(zip(x['cNames'], x['cValues']) )
              for idx, x in df.iterrows()
            }).T.join(df.iloc[:,2:])
1.29 ms ± 36.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

concat:

%%timeit
pd.concat([pd.Series(x['cValues'], x['cNames'], name=idx) 
               for idx, x in df.iterrows()], 
          axis=1
         ).T.join(df.iloc[:,2:])
2.03 ms ± 86.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 

KJDII's new series

%%timeit
df['series'] = df.apply(lambda x: dict(zip(x['cNames'], x['cValues'])), axis=1)
pd.concat([df['number'], df['series'].apply(pd.Series)], axis=1)

2.09 ms ± 65.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Scott's apply(pd.Series.explode)

%%timeit
df.apply(pd.Series.explode)\
  .set_index(['number', 'cNames'], append=True)['cValues']\
  .unstack()\
  .reset_index()\
  .drop('level_0', axis=1)

4.9 ms ± 135 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

wwnde's set_index.apply(explode)

%%timeit
g=df.set_index('number').apply(lambda x: x.explode()).reset_index()
g['cValues']=g['cValues'].astype(int)
pd.pivot_table(g, index=["number"],values=["cValues"],columns=["cNames"]).droplevel(0, axis=1).reset_index()

7.27 ms ± 162 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Celius' double explode

%%timeit
df1 = df.explode('cNames').explode('cValues')
df1['cValues'] = pd.to_numeric(df1['cValues'])
df1.pivot_table(columns='cNames',index='number',values='cValues')

9.42 ms ± 189 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
like image 104
Quang Hoang Avatar answered Oct 19 '22 10:10

Quang Hoang


You can concatenate explode() and then pivot the table back to desired output!

df = df.explode('cNames').explode('cValues')
df['cValues'] = pd.to_numeric(df['cValues'])
print(df.pivot_table(columns='cNames',index='number',values='cValues'))

Output:

cNames     a     b    c     d
number                       
10       2.0   2.0  2.0   NaN
20      66.0  66.0  NaN  66.0

Pitifully, the output of explode is of type object therefore, we must transform it first to pd.to_numeric() before pivoting. Otherwise there will no be numeric values to aggregate.

like image 7
Celius Stingher Avatar answered Oct 19 '22 08:10

Celius Stingher



import pandas as pd

d = {'cNames': [['a','b','c'], ['a','b','d']], 'cValues': [[1,2,3], 
[55,66,77]], 'number': [10,20]}
df = pd.DataFrame(data=d)

df['series'] = df.apply(lambda x: dict(zip(x['cNames'], x['cValues'])), axis=1)
df = pd.concat([df['number'], df['series'].apply(pd.Series)], axis=1)
print(df)

   number     a     b    c     d
0      10   1.0   2.0  3.0   NaN
1      20  55.0  66.0  NaN  77.0

if column order matters:


columns = ['a', 'b', 'c', 'd', 'number']
df = df[columns]

      a     b    c     d  number
0   1.0   2.0  3.0   NaN      10
1  55.0  66.0  NaN  77.0      20


like image 5
KJDII Avatar answered Oct 19 '22 09:10

KJDII