I have a dataframe with ~1M rows and around 20 columns. I'm looking to merge these columns into one; under the same unique identifier column.
To illustrate, this is the raw data:
ID Column 2 Column 3
0 1001 B A
1 1002 C D
2 1003 E E
3 1004 G H
Desired outcome:
ID Merged
0 1001 B
1 1001 A
2 1002 C
3 1002 D
4 1003 E
5 1003 E
6 1004 G
7 1004 H
Since there are ~20 columns I'm looking to merge; I need to merge them through a loop. I used the below code but I always get a memory error.
master = pd.DataFrame()
for col in original_df: # original_df is the dataframe combining the 20 columns
if col != 'ID':
temp = pd.DataFrame()
temp['ID'] = original_df['ID']
temp['Merged'] = original_df[col]
master = master.append([temp])
You can use df.melt
, something like this:
newdf = (df.melt(id_vars='ID', value_vars = ['Column 2', 'Column 3'],
value_name='Merged')
.drop('variable', axis=1)
.sort_values('ID'))
>>> newdf
ID Merged
0 1001 B
4 1001 A
1 1002 C
5 1002 D
2 1003 E
6 1003 E
3 1004 G
7 1004 H
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With