I am beginner in Python and coding. I need help comparing two dataframes of different lengths and with different column labels except one. The column that is the same between the two datasets is the column I want to compare the dataframe by. My data looks like this:
df: 'fruits' 'trees' 'sports' 'countries'
bananas mongolia basketball Spain
grapes Oak rugby Thailand
oranges Osage Orange baseball Egypt
apples Maple golf Chile
df2: 'cars' 'flowers' 'countries' 'vegetables'
Audi Rose Spain Carrots
BMW Tulip Nigeria Celery
Honda Dandelion Egypt Onion
I would to compare these two dataframes based on the column 'countries'and create three separate outputs each in their own dataframe. I have been using Pandas and have used pd.concat to combine df1 and df2 into one. I would also like to keep the rows of the rest of the dataframe even though they don't match.
Here are my desired outputs:
Output# 1: Values in df NOT in df2:
d3: 'fruits' 'trees' 'sports' 'countries'
grapes Oak rugby Thailand
apples Maple golf Chile
Output# 2: Values in df2 NOT in df
df4: 'cars' 'flowers' 'countries' 'vegetables'
BMW Tulip Nigeria Celery
Output# 3: Values in both df AND df2 (with the columns from the different dataframes combined.)
df5: 'fruits' 'trees' 'sports' 'cars' 'flowers' 'countries' 'vegetables'
bananas mongolia basketball Audi Rose Spain Carrots
Oranges Osage Orange baseball Honda Dandelion Egypt Onion
Hope this all makes sense. I have tried so many different things (isin, DataFrame.diff and .difference, df-df2, numpy arrays, etc.) I have looked all over and I can't find exactly what I'm looking for. Any help would be greatly appreciated! Thank you!
By using equals() function we can directly check if df1 is equal to df2. This function is used to determine if two dataframe objects in consideration are equal or not. Unlike dataframe. eq() method, the result of the operation is a scalar boolean value indicating if the dataframe objects are equal or not.
The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.
Setup Reference
from StringIO import StringIO
import pandas as pd
txt1 = """fruits,trees,sports,countries
bananas,mongolia,basketball,Spain
grapes,Oak,rugby,Thailand
oranges,Osage,Orange baseball,Egypt
apples,Maple,golf,Chile"""
txt2 = """cars,flowers,countries,vegetables
Audi,Rose,Spain,Carrots
BMW,Tulip,Nigeria,Celery
Honda,Dandelion,Egypt,Onion"""
df = pd.read_csv(StringIO(txt1))
df2 = pd.read_csv(StringIO(txt2))
def outer_parts(df1, df2):
df3 = df1.merge(df2, indicator=True, how='outer')
return {n: g.drop('_merge', 1) for n, g in df3.groupby('_merge')}
dfs = outer_parts(df, df2)
dfs['both']
dfs['left_only']
dfs['right_only']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With