I have a Dataframe like:
matrix = [(222, ['A','B','C'], [1,2,3]),
(333, ['A','B','D'], [1,3,5])]
df = pd.DataFrame(matrix, columns=['timestamp', 'variable', 'value'])
timestamp variable value
222 ['A','B','C'] [1,2,3]
333 ['A','B','D'] [1,3,5]
and would like to pivot it so that the timestamp
value is kept, the unique values in the variable
column become additional columns, and values from value
are sorted in the respective columns.
The output should look as follows:
timestamp A B C D
222 1 2 3 nan
333 1 3 nan 5
any help would be greatly appreciated! :)
Using unnest first , then just pivot
unnesting(df,['variable','value']).pivot(*df.columns)
Out[79]:
variable A B C D
timestamp
222 1.0 2.0 3.0 NaN
333 1.0 3.0 NaN 5.0
def unnesting(df, explode):
idx = df.index.repeat(df[explode[0]].str.len())
df1 = pd.concat([
pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
df1.index = idx
return df1.join(df.drop(explode, 1), how='left')
Create dictionary with zip, pass to DataFrame
constructor:
a = [dict(zip(*x)) for x in zip(df['variable'], df['value'])]
print (a)
[{'A': 1, 'B': 2, 'C': 3}, {'A': 1, 'B': 3, 'D': 5}]
df = df[['timestamp']].join(pd.DataFrame(a, index=df.index))
print (df)
timestamp A B C D
0 222 1 2 3.0 NaN
1 333 1 3 NaN 5.0
If many another columns use DataFrame.pop
for extract columns:
a = [dict(zip(*x)) for x in zip(df.pop('variable'), df.pop('value'))]
df = df.join(pd.DataFrame(a, index=df.index))
print (df)
timestamp A B C D
0 222 1 2 3.0 NaN
1 333 1 3 NaN 5.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With