Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging two pandas dataframes

I've seen this question a few times, but the answers are not working for me. I have two dataframes, split_df and csv_df that I;m trying to merge on a column that is called key in each of them.

Here's the top 5 items from split_df

        key        tags
2785  30000        ipv6
2785  30000  networking
2785  30000    wireless
2785  30000   standards
2785  30000      satcom

Here's the top 5 items from the csv_df

     key
0  30000
1  30002
2  31288
3  33630
4  31663

When I run this merge statement:

common_df = pd.merge(split_df,csv_df, on=['key'], suffixes=('_left', '_right'))

I get:

Empty DataFrame
Columns: [key, tags]
Index: []

Process finished with exit code 0

I can't figure out why the merge isn't working for me. Any suggestions?

I have indexes on both (and have run the merge with them out as well)

csv_df.set_index('key')
split_df.set_index('key')

I am very new to Pyton and Pandas

Thanks!

like image 396
S Coe Avatar asked Mar 03 '26 16:03

S Coe


2 Answers

From the above code this works for me:

In [11]: pd.merge(split_df,csv_df, on=['key'], suffixes=('_left', '_right'))
Out[11]:
     key        tags
0  30000        ipv6
1  30000  networking
2  30000    wireless
3  30000   standards
4  30000      satcom

My suspicion is that the column type key is different in your dataframes (and the same int in mine).

Most likely one of them is a string (and the other a float/int).

In [12]: split_df["key"] = split_df["key"].apply(str)

In [13]: pd.merge(split_df,csv_df, on=['key'], suffixes=('_left', '_right'))
Out[13]:
Empty DataFrame
Columns: [key, tags]
Index: []

If they're both strings:

In [14]: csv_df["key"] = csv_df["key"].apply(str)

In [15]: pd.merge(split_df,csv_df, on=['key'], suffixes=('_left', '_right'))
Out[15]:
     key        tags
0  30000        ipv6
1  30000  networking
2  30000    wireless
3  30000   standards
4  30000      satcom
like image 137
Andy Hayden Avatar answered Mar 06 '26 06:03

Andy Hayden


The key column in one of your dataframes is int while the other is str

split_df.key = split_df.key.astype(int)
csv_df.key = csv_df.key.astype(int)

pd.merge(split_df,csv_df, on=['key'], suffixes=('_left', '_right'))
like image 39
piRSquared Avatar answered Mar 06 '26 06:03

piRSquared