I have two Pandas dataframes that I would like to merge into one. They have unequal length, but contain some of the same information.
Here is the first dataframe:
BOROUGH TYPE TCOUNT
MAN SPORT 5
MAN CONV 3
MAN WAGON 2
BRO SPORT 2
BRO CONV 3
Where column A
specifies a location, B
a category and C
a count.
And the second:
BOROUGH CAUSE CCOUNT
MAN ALCOHOL 5
MAN SIZE 3
BRO ALCOHOL 2
Here A
is again the same Location as in the other dataframe. But D
is another category, and E
is the count for D
in that location.
What I want (and haven't been able to do) is to get the following:
BOROUGH TYPE TCOUNT CAUSE CCOUNT
MAN SPORT 5 ALCOHOL 5
MAN CONV 3 SIZE 3
MAN WAGON 2 NaN NaN
BRO SPORT 2 ALCOHOL 2
BRO CONV 3 NaN NaN
"-" can be anything. Preferably a string saying "Nothing". If they default to NaN values, I guess it's just a matter of replacing those with a string.
EDIT:
Output:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 233 entries, 0 to 232
Data columns (total 3 columns):
BOROUGH 233 non-null object
CONTRIBUTING FACTOR VEHICLE 1 233 non-null object
RCOUNT 233 non-null int64
dtypes: int64(1), object(2)
memory usage: 7.3+ KB
None
<class 'pandas.core.frame.DataFrame'>
Int64Index: 83 entries, 0 to 82
Data columns (total 3 columns):
BOROUGH 83 non-null object
VEHICLE TYPE CODE 1 83 non-null object
VCOUNT 83 non-null int64
dtypes: int64(1), object(2)
memory usage: 2.6+ KB
None
It can be done using the merge() method. Below are some examples that depict how to merge data frames of different lengths using the above method: Example 1: Below is a program to merge two student data frames of different lengths.
Use the full_join Function to Merge Two R Data Frames With Different Number of Rows. full_join is part of the dplyr package, and it can be used to merge two data frames with a different number of rows.
It is possible to join the different columns is using concat() method. DataFrame: It is dataframe name. axis: 0 refers to the row axis and1 refers the column axis. join: Type of join.
Different column names are specified for merges in Pandas using the “left_on” and “right_on” parameters, instead of using only the “on” parameter. Merging dataframes with different names for the joining variable is achieved using the left_on and right_on arguments to the pandas merge function.
Perform a left
type merge
on columns 'A','B' for the lhs and 'A','D' for the rhs as these are your key columns
In [16]:
df.merge(df1, left_on=['A','B'], right_on=['A','D'], how='left')
Out[16]:
A B C D E
0 1 1 3 1 5
1 1 2 2 2 3
2 1 3 1 NaN NaN
3 2 1 1 1 2
4 2 2 4 NaN NaN
EDIT
Your question has changed but essentially here you can use combine_first
:
In [26]:
merged = df.combine_first(df1)
merged
Out[26]:
BOROUGH CAUSE CCOUNT TCOUNT TYPE
0 MAN ALCOHOL 5 5 SPORT
1 MAN SIZE 3 3 CONV
2 MAN ALCOHOL 2 2 WAGON
3 BRO NaN NaN 2 SPORT
4 BRO NaN NaN 3 CONV
The NaN
you see for 'CAUSE' is the string 'NaN', we can use fillna
to replace these values:
In [27]:
merged['CAUSE'] = merged['CAUSE'].fillna('Nothing')
merged['CCOUNT'] = merged['CCOUNT'].fillna(0)
merged
Out[27]:
BOROUGH CAUSE CCOUNT TCOUNT TYPE
0 MAN ALCOHOL 5 5 SPORT
1 MAN SIZE 3 3 CONV
2 MAN ALCOHOL 2 2 WAGON
3 BRO Nothing 0 2 SPORT
4 BRO Nothing 0 3 CONV
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With