Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to avoid Keyerror named 'kernelspec' in Papermill?

I am running a papermill command from withing airflow(docker). The script is stored on S3 and I run it using Python client of papermill. It ends up in an error which is not at all understandable:

Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/ipython_genutils/ipstruct.py", line 132, in __getattr__
result = self[key]
KeyError: 'kernelspec'

I tried looking into the doc but in vain.

The code that I am using is to run the papermill command is:

import time
from datetime import datetime, timedelta

from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from mypackage.datastore import db
from mypackage.workflow.transform.jupyter_notebook import run_jupyter_notebook


dag_id = "jupyter-test-dag"
default_args = {
    'owner': "aviral",
    'depends_on_past': False,
    'start_date': "2019-02-28T00:00:00",
    'email': "aviral@some_org.com",
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 0,
    'retry_delay': timedelta(minutes=5),
    'provide_context': True
}

dag = DAG(
    dag_id,
    catchup=False,
    default_args=default_args,
    schedule_interval=None,
    max_active_runs=1
)


def print_context(ds, **kwargs):
    print(kwargs)
    print(ds)
    return 'Whatever you return gets printed in the logs'


def run_python_jupyter(**kwargs):
    run_jupyter_notebook(
        script_location=kwargs["script_location"]
    )


create_job_task = PythonOperator(
    task_id="create_job",
    python_callable=run_python_jupyter,
    dag=dag,
    op_kwargs={
            "script_location": "s3://some_bucket/python3_file_write.ipynb"
    }
)

globals()[dag_id] = dag

The function run_jupyter_notebook is:

def run_jupyter_notebook(**kwargs):
    """Runs Jupyter notebook"""
    script_location = kwargs.get('script_location', '')
    if not script_location:
        raise ValueError(
            "Script location was not provided."
        )
    pm.execute_notebook(script_location, script_location.split(
        '.ipynb')[0] + "_output" + ".ipynb")

I expect the code to run without any error as I have run this on local as well(not using the s3 paths, using the local filesystem paths)

like image 598
Aviral Srivastava Avatar asked May 07 '19 13:05

Aviral Srivastava


1 Answers

Jupyter adds metadata to your notebook. Your error is related to the fact some metadata, under key kernelspec, are missing.

Example of the kernelspec object in notebook metadata:

"kernelspec": {
    "display_name": "Python 3",
    "language": "python",
    "name": "python3"
}

Thus, to solve your error you need to correct the notebook metadata to add a correct kernelspec object. The most simple way of doing this if to edit the notebook JSON document and add a kernelspec object in the metadata first level object.

"metadata": {
    "kernelspec": {
        "display_name": "Python 3",
        "language": "python",
        "name": "python3"
    },
    "language_info": {
        "codemirror_mode": {
            "name": "python",
            "version": 3
        }
    }
}

Your error might come from the fact you are using a cleaner to get read out of notebook outputs like nbstripout python package. If that's the case, take care changing nbstripout settings following the documentation.

like image 85
fpajot Avatar answered Oct 26 '22 13:10

fpajot