Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Apply Key Error

I'm fairly new to Python and data science. I'm working on the kaggle Outbrain competition, and all datasets referenced in my code can be found at https://www.kaggle.com/c/outbrain-click-prediction/data.

On to the problem: I have a dataframe with columns ['document_id', 'category_id', 'confidence_level']. I would like to add a fourth column, 'max_cat', that returns the 'category_id' value that corresponds to the greatest 'confidence_level' value for the row's 'document_id'.

import pandas as pd import numpy  main_folder = r'...filepath\data_location' + '\\'  docs_meta = pd.read_csv(main_folder + 'documents_meta.csv\documents_meta.csv',nrows=1000) docs_categories = pd.read_csv(main_folder + 'documents_categories.csv\documents_categories.csv',nrows=1000) docs_entities = pd.read_csv(main_folder + 'documents_entities.csv\documents_entities.csv',nrows=1000) docs_topics = pd.read_csv(main_folder + 'documents_topics.csv\documents_topics.csv',nrows=1000)  def find_max(row,the_df,groupby_col,value_col,target_col):    return the_df[the_df[groupby_col]==row[groupby_col]].loc[the_df[value_col].idxmax()][target_col]  test = docs_categories.copy() test['max_cat'] = test.apply(lambda x: find_max(x,test,'document_id','confidence_level','category_id')) 

This gives me the error: KeyError: ('document_id', 'occurred at index document_id')

Can anyone help explain either why this error occurred, or how to achieve my goal in a more efficient manner?

Thanks!

like image 465
user133248 Avatar asked Oct 10 '16 14:10

user133248


People also ask

How do I fix Pandas key error?

How to Fix the KeyError? We can simply fix the error by correcting the spelling of the key. If we are not sure about the spelling we can simply print the list of all column names and crosscheck.

How can Pandas avoid key errors?

We can avoid KeyError by using get() function to access the key value. If the key is missing, None is returned. We can also specify a default value to return when the key is missing.

What does KeyError 1 mean?

The Python "KeyError: 1" exception is caused when we try to access a 1 key in a a dictionary that doesn't contain the key. To solve the error, set the key in the dictionary before trying to access it or conditionally set it if it doesn't exist.

Why is Pandas not recognizing column name?

Typically this error occurs when you simply misspell a column names or include an accidental space before or after the column name.


1 Answers

As answered by EdChum in the comments. The issue is that apply works column wise by default (see the docs). Therefore, the column names cannot be accessed.

To specify that it should be applied to each row instead, axis=1 must be passed:

test.apply(lambda x: find_max(x,test,'document_id','confidence_level','category_id'), axis=1) 
like image 182
OriolAbril Avatar answered Oct 02 '22 21:10

OriolAbril