Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas - Combining Multiple Columns into one Staggered Column

How do you combine multiple columns into one staggered column? For example, if I have data:

  Column 1 Column 2
0        A        E
1        B        F
2        C        G
3        D        H

And I want it in the form:

  Column 1 
0        A       
1        E       
2        B       
3        F       
4        C       
5        G       
6        D       
7        H     

What is a good, vectorized pythonic way to go about doing this? I could probably do some sort of df.apply() hack but I'm betting there is a better way. The application is putting multiple dimensions of time series data into a single stream for ML applications.

like image 519
sfortney Avatar asked Oct 26 '25 10:10

sfortney


2 Answers

First stack the columns and then drop the multiindex:

df.stack().reset_index(drop=True)
Out: 
0    A
1    E
2    B
3    F
4    C
5    G
6    D
7    H
dtype: object

To get a dataframe:

 pd.DataFrame(df.values.reshape(-1, 1), columns=['Column 1'])

enter image description here

For a series answering OP question:

 pd.Series(df.values.flatten(), name='Column 1')

For a series timing tests:

pd.Series(get_df(n).values.flatten(), name='Column 1')

Timing

code

def get_df(n=1):
    df = pd.DataFrame({'Column 2': {0: 'E', 1: 'F', 2: 'G', 3: 'H'},
                       'Column 1': {0: 'A', 1: 'B', 2: 'C', 3: 'D'}})
    return pd.concat([df for _ in range(n)])

Given Sample

enter image description here

Given Sample * 10,000

enter image description here

Given Sample * 1,000,000

enter image description here

like image 36
piRSquared Avatar answered Oct 28 '25 23:10

piRSquared



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!