Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas merge keyerror

Consistently getting a keyerror when I try to merge two data frames. The code:

c = pd.merge(a, b, on='video_id', how='left')

Based on internet research I double checked the dtype and coerced both to int:

a = pd.read_csv(filename, index_col=False, dtype={'video_id': np.int64}, low_memory=False)
b = pd.read_csv(videoinfo, index_col=False, dtype={'video_id': np.int64})

Renamed the columns (to make sure they match):

a.columns.values[2] = "video_id"
b.columns.values[0] = "video_id"

Coerced to df:

c = pd.merge(pd.DataFrame(a), pd.DataFrame(b), on='video_id', how='left')

Out of ideas as to why I'm still getting the keyerror. And it's always "KeyError: 'video_id'"

like image 923
Sara Avatar asked Dec 11 '15 15:12

Sara


People also ask

How do I fix pandas KeyError?

How to Fix the KeyError? We can simply fix the error by correcting the spelling of the key. If we are not sure about the spelling we can simply print the list of all column names and crosscheck.

Can you merge objects pandas?

Pandas DataFrame merge() function is used to merge two DataFrame objects with a database-style join operation. The joining is performed on columns or indexes. If the joining is done on columns, indexes are ignored. This function returns a new DataFrame and the source DataFrame objects are unchanged.

How do I merge pandas series?

Combine Two Series Using pandas. merge() can be used for all database join operations between DataFrame or named series objects. You have to pass an extra parameter “name” to the series in this case. For instance, pd. merge(S1, S2, right_index=True, left_index=True) .

How avoid duplicates in pandas merge?

Use the drop() function to remove the columns with the suffix 'remove'. This will ensure that identical columns don't exist in the new dataframe.


2 Answers

You want to be careful not to use df.columns.values to rename columns. Doing so screws with the indexing on your column names.

If you know which column names you're replacing, you can try something like this:

a.rename(columns={'old_col_name':'video_id'}, inplace = True)
b.rename(columns={'old_col_name':'video_id'}, inplace = True)

If you don't know the column names ahead of time, you can try:

col_names_a = a.columns
col_names_a[index] = 'video_id'
a.columns = col_names_a

Keep in mind, you actually don't need to use the same column names on both dataframes. Pandas allows you to specify the individual names in each dataframe

pd.merge(a, b, left_on = 'a_col', right_on = 'b_col', how = 'left')
like image 115
Amy D Avatar answered Sep 18 '22 22:09

Amy D


There was a leading space in one of the dfs in the column name, 'video_id ' instead of 'video_id'. Not sure why the initial rename didn't fix that but it's fixed.

like image 34
Sara Avatar answered Sep 21 '22 22:09

Sara