Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas: Merge Columns of Data Frame with column name into one column

I have the data in the following format in my Data Frame:

>>> df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
>>> df
          A         B         C         D
0  0.578095 -1.985742 -0.269517 -0.180319
1 -0.618431 -0.937284  0.556290 -1.416877
2  1.695109  0.122219  0.182450  0.411448
3  0.228466  0.268943 -1.249488  3.227840
4  0.005990 -0.805618 -1.941092 -0.146649
5 -1.116451 -0.649854  1.272314  1.422760

I want to combine some columns at each row by appending the row data and column names creating the following output:

          A         B        New Column
0  0.578095 -1.985742  {"C":"-0.269517","D":"-0.180319"}
1 -0.618431 -0.937284  {"C":"0.556290","D":"-1.416877"}
2  1.695109  0.122219  {"C":"0.182450","D":"0.411448"}
3  0.228466  0.268943  {"C":"-1.249488","D":"3.227840"}
4  0.005990 -0.805618  {"C":"-1.941092","D":"-0.146649"}
5 -1.116451 -0.649854  {"C":"1.272314","D":"1.422760"}

How can I achieve this in pandas?

The end game is to have the data in JSON format where Column C-D are taken as Measures for the Dimensions A-B and then store them into the table in Snowflake.

like image 298
abdthe1 Avatar asked Jul 31 '18 15:07

abdthe1


People also ask

How do I combine multiple columns into one column in pandas?

You can use DataFrame. apply() for concatenate multiple column values into a single column, with slightly less typing and more scalable when you want to join multiple columns .

How merge columns with different names pandas?

Different column names are specified for merges in Pandas using the “left_on” and “right_on” parameters, instead of using only the “on” parameter. Merging dataframes with different names for the joining variable is achieved using the left_on and right_on arguments to the pandas merge function.

How do I merge two columns from a different DataFrame in pandas?

It is possible to join the different columns is using concat() method. DataFrame: It is dataframe name. axis: 0 refers to the row axis and1 refers the column axis. join: Type of join.


2 Answers

Drop the columns and create a new one with agg:

df2 = df.drop(['C', 'D'], axis=1).assign(New_Column=
    df[['C', 'D']].agg(pd.Series.to_dict, axis=1))

df2
          A         B                                         New_Column
0 -0.645719 -0.757112  {'D': 0.8923148471642509, 'C': -0.685995130541...
1 -0.124200 -0.578526  {'D': -0.5457121278891495, 'C': -1.46006615752...
2  2.160417 -0.985475  {'D': -0.49915307027471345, 'C': 0.85388172610...
3  2.111050  1.384887  {'D': -0.4617380879640236, 'C': 0.907519279458...
4  0.781630 -0.366445  {'D': -0.3105127375402184, 'C': 0.295808587414...
5  0.460773  0.549545  {'D': -0.993162129461116, 'C': 0.8163378188816...
like image 191
cs95 Avatar answered Oct 17 '22 06:10

cs95


Using to_dict with 'records'

df['New c']=df[['C','D']].to_dict('records')
df
Out[580]: 
          A         B         C         D  \
0  0.578095 -1.985742 -0.269517 -0.180319   
1 -0.618431 -0.937284  0.556290 -1.416877   
2  1.695109  0.122219  0.182450  0.411448   
3  0.228466  0.268943 -1.249488  3.227840   
4  0.005990 -0.805618 -1.941092 -0.146649   
5 -1.116451 -0.649854  1.272314  1.422760   
                                        New c  
0            {'C': -0.269517, 'D': -0.180319}  
1              {'C': 0.55629, 'D': -1.416877}  
2               {'C': 0.18245, 'D': 0.411448}  
3              {'C': -1.249488, 'D': 3.22784}  
4  {'C': -1.9410919999999998, 'D': -0.146649}  
5               {'C': 1.272314, 'D': 1.42276}  
like image 4
BENY Avatar answered Oct 17 '22 08:10

BENY