I have the data in the following format in my Data Frame:
>>> df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
>>> df
A B C D
0 0.578095 -1.985742 -0.269517 -0.180319
1 -0.618431 -0.937284 0.556290 -1.416877
2 1.695109 0.122219 0.182450 0.411448
3 0.228466 0.268943 -1.249488 3.227840
4 0.005990 -0.805618 -1.941092 -0.146649
5 -1.116451 -0.649854 1.272314 1.422760
I want to combine some columns at each row by appending the row data and column names creating the following output:
A B New Column
0 0.578095 -1.985742 {"C":"-0.269517","D":"-0.180319"}
1 -0.618431 -0.937284 {"C":"0.556290","D":"-1.416877"}
2 1.695109 0.122219 {"C":"0.182450","D":"0.411448"}
3 0.228466 0.268943 {"C":"-1.249488","D":"3.227840"}
4 0.005990 -0.805618 {"C":"-1.941092","D":"-0.146649"}
5 -1.116451 -0.649854 {"C":"1.272314","D":"1.422760"}
How can I achieve this in pandas?
The end game is to have the data in JSON format where Column C-D are taken as Measures for the Dimensions A-B and then store them into the table in Snowflake.
You can use DataFrame. apply() for concatenate multiple column values into a single column, with slightly less typing and more scalable when you want to join multiple columns .
Different column names are specified for merges in Pandas using the “left_on” and “right_on” parameters, instead of using only the “on” parameter. Merging dataframes with different names for the joining variable is achieved using the left_on and right_on arguments to the pandas merge function.
It is possible to join the different columns is using concat() method. DataFrame: It is dataframe name. axis: 0 refers to the row axis and1 refers the column axis. join: Type of join.
Drop the columns and create a new one with agg
:
df2 = df.drop(['C', 'D'], axis=1).assign(New_Column=
df[['C', 'D']].agg(pd.Series.to_dict, axis=1))
df2
A B New_Column
0 -0.645719 -0.757112 {'D': 0.8923148471642509, 'C': -0.685995130541...
1 -0.124200 -0.578526 {'D': -0.5457121278891495, 'C': -1.46006615752...
2 2.160417 -0.985475 {'D': -0.49915307027471345, 'C': 0.85388172610...
3 2.111050 1.384887 {'D': -0.4617380879640236, 'C': 0.907519279458...
4 0.781630 -0.366445 {'D': -0.3105127375402184, 'C': 0.295808587414...
5 0.460773 0.549545 {'D': -0.993162129461116, 'C': 0.8163378188816...
Using to_dict
with 'records'
df['New c']=df[['C','D']].to_dict('records')
df
Out[580]:
A B C D \
0 0.578095 -1.985742 -0.269517 -0.180319
1 -0.618431 -0.937284 0.556290 -1.416877
2 1.695109 0.122219 0.182450 0.411448
3 0.228466 0.268943 -1.249488 3.227840
4 0.005990 -0.805618 -1.941092 -0.146649
5 -1.116451 -0.649854 1.272314 1.422760
New c
0 {'C': -0.269517, 'D': -0.180319}
1 {'C': 0.55629, 'D': -1.416877}
2 {'C': 0.18245, 'D': 0.411448}
3 {'C': -1.249488, 'D': 3.22784}
4 {'C': -1.9410919999999998, 'D': -0.146649}
5 {'C': 1.272314, 'D': 1.42276}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With