I have a dataframe with two rows and I'd like to merge the two rows to one row. The df Looks as follows:
PC Rating CY Rating PY HT
0 DE101 NaN AA GV
0 DE101 AA+ NaN GV
I have tried to create two seperate dataframes and Combine them with df.merge(df2) without success. The result should be the following
PC Rating CY Rating PY HT
0 DE101 AA+ AA GV
Any ideas? Thanks in advance Could df.update be a possible solution?
EDIT:
df.head(1).combine_first(df.tail(1))
This works for the example above. However, for columns containing numerical values, this approach doesn't yield the desired output, e.g. for
PC Rating CY Rating PY HT MV1 MV2
0 DE101 NaN AA GV 0 20
0 DE101 AA+ NaN GV 10 0
The output should be:
PC Rating CY Rating PY HT MV1 MV2
0 DE101 AA+ AA GV 10 20
The formula above doesn't sum up the values in the last two columns, but takes the values in the first row of the dataframe.
PC Rating CY Rating PY HT MV1 MV2
0 DE101 AA+ AA GV 0 20
How could this problem be fixed?
You could use max
with transpose like
In [2103]: df.max().to_frame().T
Out[2103]:
PC Rating CY Rating PY HT MV1 MV2
0 DE101 AA+ AA GV 10 20
You can make use of DF.combine_first()
method after separating the DF
into 2 parts where the null values in the first half would be replaced with the finite values in the other half while keeping it's other finite values untouched:
df.head(1).combine_first(df.tail(1))
# Practically this is same as → df.head(1).fillna(df.tail(1))
Incase there are columns of mixed datatype, partitioning them into it's constituent dtype
columns and then performing various operations on it would be feasible by chaining them across.
obj_df = df.select_dtypes(include=[np.object])
num_df = df.select_dtypes(exclude=[np.object])
obj_df.head(1).combine_first(obj_df.tail(1)).join(num_df.head(1).add(num_df.tail(1)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With