I have dataset in the following format: <pre class="prettyprint"><code>df = pd.DataFrame({'x':[1,2,3], 'y':[10,20,30], 'v1':[3,2,3] , 'v2':[13,25,31] }) >> v1 v2 x y 3 13 1 10 2 25 2 20 3 31 3 30 </code></pre> Setting the index column with x, I want to flatten the data combining v1 and v2 (V), The expected output is like: <pre class="prettyprint"><code>>> x y V 1 10 3 1 10 13 2 20 2 2 20 25 3 30 3 3 30 31 </code></pre> And again bringing to the original format of df. I tried reshaping using stack and unstack, but I couldn't get it the way, which I was expecting. Many Thanks!

<code>pd.lreshape</code> can reformat wide data to long format: <pre class="prettyprint"><code>In [55]: pd.lreshape(df, {'V':['v1', 'v2']}) Out[57]: x y V 0 1 10 3 1 2 20 2 2 3 30 3 3 1 10 13 4 2 20 25 5 3 30 31 </code></pre> <code>lreshape</code> is an undocumented "experimental" feature. To learn more about <code>lreshape</code> see <code>help(pd.lreshape)</code>. <hr> If you need reversible operations, use jezrael's <code>pd.melt</code> solution to go from wide to long format, and use <code>pivot_table</code> to go from long to wide format: <pre class="prettyprint"><code>In [72]: melted = pd.melt(df, id_vars=['x', 'y'], value_name='V'); melted Out[72]: x y variable V 0 1 10 v1 3 1 2 20 v1 2 2 3 30 v1 3 3 1 10 v2 13 4 2 20 v2 25 5 3 30 v2 31 In [74]: df2 = melted.pivot_table(index=['x','y'], columns=['variable'], values='V').reset_index(); df2 Out[74]: variable x y v1 v2 0 1 10 3 13 1 2 20 2 25 2 3 30 3 31 </code></pre> Notice that you must hang on to the <code>variable</code> column if you wish to return to <code>df2</code>. Also keep in mind that it is more efficient to simply retain a reference to <code>df</code> than to recompute it using <code>melted</code> and <code>pivot_table</code>.

You can use <code>stack</code> with <code>set_index</code>. Last <code>drop</code> column <code>level_2</code>: <pre class="prettyprint"><code>print (df.set_index(['x','y']).stack().reset_index(name='V').drop('level_2', axis=1)) x y V 0 1 10 3 1 1 10 13 2 2 20 2 3 2 20 25 4 3 30 3 5 3 30 31 </code></pre> Another solution with <code>melt</code> and <code>sort_values</code>: <pre class="prettyprint"><code>print (pd.melt(df, id_vars=['x','y'], value_name='V') .drop('variable', axis=1) .sort_values('x')) x y V 0 1 10 3 3 1 10 13 1 2 20 2 4 2 20 25 2 3 30 3 5 3 30 31 </code></pre>

Pandas Flatten a dataframe to a single column

Tags:

python

python-3.x

pandas

I have dataset in the following format:

df = pd.DataFrame({'x':[1,2,3], 'y':[10,20,30], 'v1':[3,2,3] , 'v2':[13,25,31] })

>> v1 v2  x   y
   3  13  1  10
   2  25  2  20
   3  31  3  30

Setting the index column with x, I want to flatten the data combining v1 and v2 (V), The expected output is like:

>> x   y   V
   1  10   3
   1  10   13
   2  20   2
   2  20   25
   3  30   3
   3  30   31

And again bringing to the original format of df. I tried reshaping using stack and unstack, but I couldn't get it the way, which I was expecting.

Many Thanks!

233

asked Jul 27 '16 11:07

NMSD

Video Answer

2 Answers

pd.lreshape can reformat wide data to long format:

In [55]: pd.lreshape(df, {'V':['v1', 'v2']})
Out[57]: 
   x   y   V
0  1  10   3
1  2  20   2
2  3  30   3
3  1  10  13
4  2  20  25
5  3  30  31

lreshape is an undocumented "experimental" feature. To learn more about lreshape see help(pd.lreshape).

If you need reversible operations, use jezrael's pd.melt solution to go from wide to long format, and use pivot_table to go from long to wide format:

In [72]: melted = pd.melt(df, id_vars=['x', 'y'], value_name='V'); melted
Out[72]: 
   x   y variable   V
0  1  10       v1   3
1  2  20       v1   2
2  3  30       v1   3
3  1  10       v2  13
4  2  20       v2  25
5  3  30       v2  31

In [74]: df2 = melted.pivot_table(index=['x','y'], columns=['variable'], values='V').reset_index(); df2
Out[74]: 
variable  x   y  v1  v2
0         1  10   3  13
1         2  20   2  25
2         3  30   3  31

Notice that you must hang on to the variable column if you wish to return to df2. Also keep in mind that it is more efficient to simply retain a reference to df than to recompute it using melted and pivot_table.

159

answered Oct 28 '22 17:10

unutbu

You can use stack with set_index. Last drop column level_2:

print (df.set_index(['x','y']).stack().reset_index(name='V').drop('level_2', axis=1))
   x   y   V
0  1  10   3
1  1  10  13
2  2  20   2
3  2  20  25
4  3  30   3
5  3  30  31

Another solution with melt and sort_values:

print (pd.melt(df, id_vars=['x','y'], value_name='V')
         .drop('variable', axis=1)
         .sort_values('x'))

   x   y   V
0  1  10   3
3  1  10  13
1  2  20   2
4  2  20  25
2  3  30   3
5  3  30  31

answered Oct 28 '22 18:10

jezrael

Related questions
                            
                                How to turn a string with unquoted keys into a dict in Python
                            
                                Python Requests - authentication after redirect
                            
                                `requirements.txt` dependencies, getting only high level dependencies
                            
                                Can I sign an X509 certificate entirely in Python?
                            
                                Numeric value directly after backreference [duplicate]
                            
                                Random Sampling of Pandas data frame (both rows and columns)
                            
                                How to implement left outer join in python pandas? [duplicate]
                            
                                Pandas: increment datetime
                            
                                Django include template tag in for loop only catches first iteration
                            
                                Can't seem to retrieve stripe charge using python
                            
                                Potential Exceptions using builtin str() type in Python
                            
                                DoesNotExist at /accounts/register/ Site matching query does not exist. (django, python)
                            
                                add labels to sklearn k-means
                            
                                Select rows from a pandas dataframe where two columns match list of pairs
                            
                                Aligning a text box edge with an image corner
                            
                                Highlight specific points in matplotlib scatterplot
                            
                                Can I add permissions to media django media files?
                            
                                Volume Yahoo Finance
                            
                                Limit Google OAuth access to one domain using 'hd' param (Django / python-social-auth)
                            
                                Flatten nested pandas dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With