I have dataset in the following format:
df = pd.DataFrame({'x':[1,2,3], 'y':[10,20,30], 'v1':[3,2,3] , 'v2':[13,25,31] })
>> v1 v2 x y
3 13 1 10
2 25 2 20
3 31 3 30
Setting the index column with x, I want to flatten the data combining v1 and v2 (V), The expected output is like:
>> x y V
1 10 3
1 10 13
2 20 2
2 20 25
3 30 3
3 30 31
And again bringing to the original format of df. I tried reshaping using stack and unstack, but I couldn't get it the way, which I was expecting.
Many Thanks!
The first method to flatten the pandas dataframe is through NumPy python package. There is a function in NumPy that is numpy. flatten() that perform this task. First, you have to convert the dataframe to numpy using the to_numpy() method and then apply the flatten() method.
Flatten columns: use get_level_values() Flatten columns: use to_flat_index() Flatten columns: join column labels. Flatten rows: flatten all levels.
To slice the columns, the syntax is df. loc[:,start:stop:step] ; where start is the name of the first column to take, stop is the name of the last column to take, and step as the number of indices to advance after each extraction; for example, you can select alternate columns.
The two major sort functions You can check the API for sort_values and sort_index at the Pandas documentation for details on the parameters. sort_values() : You use this to sort the Pandas DataFrame by one or more columns. sort_index() : You use this to sort the Pandas DataFrame by the row index.
pd.lreshape
can reformat wide data to long format:
In [55]: pd.lreshape(df, {'V':['v1', 'v2']})
Out[57]:
x y V
0 1 10 3
1 2 20 2
2 3 30 3
3 1 10 13
4 2 20 25
5 3 30 31
lreshape
is an undocumented "experimental" feature. To learn more about lreshape
see help(pd.lreshape)
.
If you need reversible operations, use jezrael's pd.melt
solution to go from wide to long format, and use pivot_table
to go from long to wide format:
In [72]: melted = pd.melt(df, id_vars=['x', 'y'], value_name='V'); melted
Out[72]:
x y variable V
0 1 10 v1 3
1 2 20 v1 2
2 3 30 v1 3
3 1 10 v2 13
4 2 20 v2 25
5 3 30 v2 31
In [74]: df2 = melted.pivot_table(index=['x','y'], columns=['variable'], values='V').reset_index(); df2
Out[74]:
variable x y v1 v2
0 1 10 3 13
1 2 20 2 25
2 3 30 3 31
Notice that you must hang on to the variable
column if you wish to return to df2
. Also keep in mind that it is more efficient to simply retain a reference to df
than to recompute it using melted
and pivot_table
.
You can use stack
with set_index
. Last drop
column level_2
:
print (df.set_index(['x','y']).stack().reset_index(name='V').drop('level_2', axis=1))
x y V
0 1 10 3
1 1 10 13
2 2 20 2
3 2 20 25
4 3 30 3
5 3 30 31
Another solution with melt
and sort_values
:
print (pd.melt(df, id_vars=['x','y'], value_name='V')
.drop('variable', axis=1)
.sort_values('x'))
x y V
0 1 10 3
3 1 10 13
1 2 20 2
4 2 20 25
2 3 30 3
5 3 30 31
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With