Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge two columns into one within the same data frame in pandas/python

I have a question to merge two columns into one in the same dataframe(start_end), also remove null value. I intend to merge 'Start station' and 'End station' into 'station', and keep 'duration' according to the new column 'station'. I have tried pd.merge, pd.concat, pd.append, but I cannot work it out.

dataFrame of Start_end:

    Duration    End station     Start station
14  1407        NaN             14th & V St NW
19  509         NaN             21st & I St NW
20  638         15th & P St NW.  NaN
27  1532        NaN              Massachusetts Ave & Dupont Circle NW
28  759         NaN              Adams Mill & Columbia Rd NW

Expected output:

    Duration    stations
14  1407        14th & V St NW
19  509         21st & I St NW
20  638         15th & P St NW
27  1532        Massachusetts Ave & Dupont Circle NW
28  759         Adams Mill & Columbia Rd NW

Code i have so far:

#start_end is the dataframe, 'start station', 'end station', 'duration'
start_end = pd.concat([df_start, df_end])

This is what I attempted to:

station = pd.merge([start_end['Start station'],start_end['End station']])
like image 663
BCKN Avatar asked Jun 03 '18 01:06

BCKN


2 Answers

fillna

If NaN are truly nulls

df.assign(**{
    'Start station': df['Start station'].fillna(df['End station'])})

    Duration      End station                         Start station
14      1407              NaN                        14th & V St NW
19       509              NaN                        21st & I St NW
20       638  15th & P St NW.                       15th & P St NW.
27      1532              NaN  Massachusetts Ave & Dupont Circle NW
28       759              NaN           Adams Mill & Columbia Rd NW

mask

If NaN are strings

df.assign(**{
    'Start station': df['Start station'].mask(
        lambda x: x == 'NaN', df['End station'])})

    Duration      End station                         Start station
14      1407              NaN                        14th & V St NW
19       509              NaN                        21st & I St NW
20       638  15th & P St NW.                       15th & P St NW.
27      1532              NaN  Massachusetts Ave & Dupont Circle NW
28       759              NaN           Adams Mill & Columbia Rd NW
like image 159
piRSquared Avatar answered Oct 28 '22 10:10

piRSquared


Using combine_first. replaces null values in col1 with col2

df["station"] = df["End station"].combine_first(df["Start station"])
df.drop(["End station", "Start station"], 1, inplace=True)
like image 30
sjd Avatar answered Oct 28 '22 10:10

sjd