Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to merge multiple columns into one? [duplicate]

I have a dataframe with ~1M rows and around 20 columns. I'm looking to merge these columns into one; under the same unique identifier column.

To illustrate, this is the raw data:

          ID   Column 2 Column 3
0       1001          B        A
1       1002          C        D
2       1003          E        E
3       1004          G        H

Desired outcome:

          ID     Merged
0       1001          B
1       1001          A
2       1002          C
3       1002          D
4       1003          E
5       1003          E
6       1004          G
7       1004          H

Since there are ~20 columns I'm looking to merge; I need to merge them through a loop. I used the below code but I always get a memory error.

master = pd.DataFrame()
for col in original_df:  # original_df is the dataframe combining the 20 columns
    if col != 'ID':
        temp = pd.DataFrame()
        temp['ID'] = original_df['ID']
        temp['Merged'] = original_df[col]
        master = master.append([temp])
like image 865
user3468356 Avatar asked Dec 14 '22 16:12

user3468356


1 Answers

You can use df.melt, something like this:

newdf = (df.melt(id_vars='ID', value_vars = ['Column 2', 'Column 3'],
                 value_name='Merged')
         .drop('variable', axis=1)
         .sort_values('ID'))

>>> newdf
     ID Merged
0  1001      B
4  1001      A
1  1002      C
5  1002      D
2  1003      E
6  1003      E
3  1004      G
7  1004      H
like image 104
sacuL Avatar answered Dec 29 '22 10:12

sacuL