Pandas Apply Key Error

Tags:

I'm fairly new to Python and data science. I'm working on the kaggle Outbrain competition, and all datasets referenced in my code can be found at https://www.kaggle.com/c/outbrain-click-prediction/data.

On to the problem: I have a dataframe with columns ['document_id', 'category_id', 'confidence_level']. I would like to add a fourth column, 'max_cat', that returns the 'category_id' value that corresponds to the greatest 'confidence_level' value for the row's 'document_id'.

import pandas as pd import numpy  main_folder = r'...filepath\data_location' + '\\'  docs_meta = pd.read_csv(main_folder + 'documents_meta.csv\documents_meta.csv',nrows=1000) docs_categories = pd.read_csv(main_folder + 'documents_categories.csv\documents_categories.csv',nrows=1000) docs_entities = pd.read_csv(main_folder + 'documents_entities.csv\documents_entities.csv',nrows=1000) docs_topics = pd.read_csv(main_folder + 'documents_topics.csv\documents_topics.csv',nrows=1000)  def find_max(row,the_df,groupby_col,value_col,target_col):    return the_df[the_df[groupby_col]==row[groupby_col]].loc[the_df[value_col].idxmax()][target_col]  test = docs_categories.copy() test['max_cat'] = test.apply(lambda x: find_max(x,test,'document_id','confidence_level','category_id'))

This gives me the error: KeyError: ('document_id', 'occurred at index document_id')

Can anyone help explain either why this error occurred, or how to achieve my goal in a more efficient manner?

Thanks!

465

asked Oct 10 '16 14:10

user133248

1 Answers

As answered by EdChum in the comments. The issue is that apply works column wise by default (see the docs). Therefore, the column names cannot be accessed.

To specify that it should be applied to each row instead, axis=1 must be passed:

test.apply(lambda x: find_max(x,test,'document_id','confidence_level','category_id'), axis=1)

182

answered Oct 02 '22 21:10

OriolAbril

Related questions
                            
                                Is it a good idea to using class as a namespace in Python
                            
                                Python/Matplotlib - Colorbar Range and Display Values
                            
                                pydev: find all references to a function
                            
                                Time-Limited Input? [duplicate]
                            
                                Why use dict.keys?
                            
                                if x:, vs if x == True, vs if x is True
                            
                                How to use python 3 as a build script in non-python travis configuration?
                            
                                What does pip install . (dot) mean?
                            
                                What is the purpose of the c flag in the "conda install" command
                            
                                For Python programmers, is there anything equivalent to Perl's CPAN?
                            
                                Compare dictionaries ignoring specific keys
                            
                                Pyusb on windows - no backend available
                            
                                easyprocess.EasyProcessCheckInstalledError: cmd=['Xvfb', '-help'] OSError=[Errno 2] No such file or directory
                            
                                Why does the shape of a 1D array not show the number of rows as 1?
                            
                                How to use dash within Jupyter notebook or JupyterLab?
                            
                                How to write the Visitor Pattern for Abstract Syntax Tree in Python?
                            
                                ImportError: No module named statsmodels
                            
                                xlsxwriter: is there a way to open an existing worksheet in my workbook?
                            
                                pandas - Extend Index of a DataFrame setting all columns for new rows to NaN?
                            
                                What is the necessity of plt.figure() in matplotlib?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas Apply Key Error

Tags:

python

pandas

group-by

kaggle

keyerror

user133248

People also ask

1 Answers

OriolAbril

Recent Activity

Donate For Us