I have the following pandas df:
original mean
0 0.000000 0.065500
1 0.131000 0.135890
2 0.140779 0.144875
3 0.148971 0.150029
4 0.151088 0.144309
How can I merge the 2 columns to be like this:
original
0 0.000000
1 0.065500
2 0.131000
3 0.135890
4 0.140779
5 0.144875
6 0.148971
7 0.150029
8 0.151088
9 0.144309
use stack() method:
In [2]: df
Out[2]:
original mean
0 0.000000 0.065500
1 0.131000 0.135890
2 0.140779 0.144875
3 0.148971 0.150029
4 0.151088 0.144309
In [3]: df.stack()
Out[3]:
0 original 0.000000
mean 0.065500
1 original 0.131000
mean 0.135890
2 original 0.140779
mean 0.144875
3 original 0.148971
mean 0.150029
4 original 0.151088
mean 0.144309
dtype: float64
In [4]: df.stack().reset_index(level=[0,1], drop=True)
Out[4]:
0 0.000000
1 0.065500
2 0.131000
3 0.135890
4 0.140779
5 0.144875
6 0.148971
7 0.150029
8 0.151088
9 0.144309
dtype: float64
You can call reshape
on the values and construct another df:
In [7]:
pd.DataFrame(data=df.values.reshape(df.shape[0]*2,-1), columns=['original'])
Out[7]:
original
0 0.000000
1 0.065500
2 0.131000
3 0.135890
4 0.140779
5 0.144875
6 0.148971
7 0.150029
8 0.151088
9 0.144309
Timings
On your sample dataset:
In [8]:
%timeit df.stack().reset_index(level=[0,1], drop=True)
%timeit pd.DataFrame(data=df.values.reshape(df.shape[0]*2,-1), columns=['original'])
1000 loops, best of 3: 820 µs per loop
1000 loops, best of 3: 446 µs per loop
reshaping on the numpy arrays is nearly twice as fast here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With