I have a dataframe in the following structure:
cNames | cValues | number
[a,b,c] | [1,2,3] | 10
[a,b,d] | [55,66,77]| 20
I would like to transpose - create columns from the names in cNames.
But I can't manage to achieve this with transpose because I want a column for each value in the list.
The needed output:
a | b | c | d | number
1 | 2 | 3 | NaN | 10
55 | 66 | NaN | 77 | 20
How can I achieve this result?
Thanks!
The code to create the DF:
d = {'cNames': [['a','b','c'], ['a','b','d']], 'cValues': [[1,2,3],
[55,66,77]], 'number': [10,20]}
df = pd.DataFrame(data=d)
Pandas DataFrame. transpose() is a library function that transpose index and columns. The transpose reflects the DataFrame over its main diagonal by writing rows as columns and vice-versa. Use the T attribute or the transpose() method to swap (= transpose) the rows and columns of DataFrame.
Transpose with built-in function zip() You can transpose a two-dimensional list using the built-in function zip() . zip() is a function that returns an iterator that summarizes the multiple iterables ( list , tuple , etc.). In addition, use * that allows you to unpack the list and pass its elements to the function.
Pandas DataFrame: transpose() functionThe transpose() function is used to transpose index and columns. Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. If True, the underlying data is copied. Otherwise (default), no copy is made if possible.
transpose() function transpose index and columns of the dataframe. It reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa.
Transpose index and columns. Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. The property T is an accessor to the method transpose (). If True, the underlying data is copied. Otherwise (default), no copy is made if possible.
You can use the following syntax to transpose Pandas DataFrame: df = df.transpose () Let’s see how to apply the above syntax by reviewing 3 cases of: Transposing a DataFrame with a default index
(1.) Click the column name that you want to transpose data based on, and select Primary Key; (2.) Click another column that you want to transpose, and click Combine then choose one separator to separate the combined data, such as space, comma, semicolon.
Python | Pandas DataFrame.transpose. Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. It can be thought of as a dict-like container for Series objects. This is the primary data structure of the Pandas.
One option is concat
:
pd.concat([pd.Series(x['cValues'], x['cNames'], name=idx)
for idx, x in df.iterrows()],
axis=1
).T.join(df.iloc[:,2:])
Or a DataFrame construction:
pd.DataFrame({idx: dict(zip(x['cNames'], x['cValues']) )
for idx, x in df.iterrows()
}).T.join(df.iloc[:,2:])
Output:
a b c d number
0 1.0 2.0 3.0 NaN 10
1 55.0 66.0 NaN 77.0 20
Update Performances sort by run time on sample data
DataFrame
%%timeit
pd.DataFrame({idx: dict(zip(x['cNames'], x['cValues']) )
for idx, x in df.iterrows()
}).T.join(df.iloc[:,2:])
1.29 ms ± 36.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
concat:
%%timeit
pd.concat([pd.Series(x['cValues'], x['cNames'], name=idx)
for idx, x in df.iterrows()],
axis=1
).T.join(df.iloc[:,2:])
2.03 ms ± 86.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
KJDII's new series
%%timeit
df['series'] = df.apply(lambda x: dict(zip(x['cNames'], x['cValues'])), axis=1)
pd.concat([df['number'], df['series'].apply(pd.Series)], axis=1)
2.09 ms ± 65.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Scott's apply(pd.Series.explode)
%%timeit
df.apply(pd.Series.explode)\
.set_index(['number', 'cNames'], append=True)['cValues']\
.unstack()\
.reset_index()\
.drop('level_0', axis=1)
4.9 ms ± 135 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
wwnde's set_index.apply(explode)
%%timeit
g=df.set_index('number').apply(lambda x: x.explode()).reset_index()
g['cValues']=g['cValues'].astype(int)
pd.pivot_table(g, index=["number"],values=["cValues"],columns=["cNames"]).droplevel(0, axis=1).reset_index()
7.27 ms ± 162 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Celius' double explode
%%timeit
df1 = df.explode('cNames').explode('cValues')
df1['cValues'] = pd.to_numeric(df1['cValues'])
df1.pivot_table(columns='cNames',index='number',values='cValues')
9.42 ms ± 189 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
You can concatenate explode()
and then pivot the table back to desired output!
df = df.explode('cNames').explode('cValues')
df['cValues'] = pd.to_numeric(df['cValues'])
print(df.pivot_table(columns='cNames',index='number',values='cValues'))
Output:
cNames a b c d
number
10 2.0 2.0 2.0 NaN
20 66.0 66.0 NaN 66.0
Pitifully, the output of explode is of type object
therefore, we must transform it first to pd.to_numeric()
before pivoting. Otherwise there will no be numeric values to aggregate.
import pandas as pd
d = {'cNames': [['a','b','c'], ['a','b','d']], 'cValues': [[1,2,3],
[55,66,77]], 'number': [10,20]}
df = pd.DataFrame(data=d)
df['series'] = df.apply(lambda x: dict(zip(x['cNames'], x['cValues'])), axis=1)
df = pd.concat([df['number'], df['series'].apply(pd.Series)], axis=1)
print(df)
number a b c d
0 10 1.0 2.0 3.0 NaN
1 20 55.0 66.0 NaN 77.0
if column order matters:
columns = ['a', 'b', 'c', 'd', 'number']
df = df[columns]
a b c d number
0 1.0 2.0 3.0 NaN 10
1 55.0 66.0 NaN 77.0 20
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With