I am running a papermill command from withing airflow(docker). The script is stored on S3 and I run it using Python client of papermill. It ends up in an error which is not at all understandable:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/ipython_genutils/ipstruct.py", line 132, in __getattr__
result = self[key]
KeyError: 'kernelspec'
I tried looking into the doc but in vain.
The code that I am using is to run the papermill command is:
import time
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from mypackage.datastore import db
from mypackage.workflow.transform.jupyter_notebook import run_jupyter_notebook
dag_id = "jupyter-test-dag"
default_args = {
'owner': "aviral",
'depends_on_past': False,
'start_date': "2019-02-28T00:00:00",
'email': "aviral@some_org.com",
'email_on_failure': False,
'email_on_retry': False,
'retries': 0,
'retry_delay': timedelta(minutes=5),
'provide_context': True
}
dag = DAG(
dag_id,
catchup=False,
default_args=default_args,
schedule_interval=None,
max_active_runs=1
)
def print_context(ds, **kwargs):
print(kwargs)
print(ds)
return 'Whatever you return gets printed in the logs'
def run_python_jupyter(**kwargs):
run_jupyter_notebook(
script_location=kwargs["script_location"]
)
create_job_task = PythonOperator(
task_id="create_job",
python_callable=run_python_jupyter,
dag=dag,
op_kwargs={
"script_location": "s3://some_bucket/python3_file_write.ipynb"
}
)
globals()[dag_id] = dag
The function run_jupyter_notebook
is:
def run_jupyter_notebook(**kwargs):
"""Runs Jupyter notebook"""
script_location = kwargs.get('script_location', '')
if not script_location:
raise ValueError(
"Script location was not provided."
)
pm.execute_notebook(script_location, script_location.split(
'.ipynb')[0] + "_output" + ".ipynb")
I expect the code to run without any error as I have run this on local as well(not using the s3 paths, using the local filesystem paths)
Jupyter adds metadata to your notebook. Your error is related to the fact some metadata, under key kernelspec, are missing.
Example of the kernelspec object in notebook metadata:
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
}
Thus, to solve your error you need to correct the notebook metadata to add a correct kernelspec object. The most simple way of doing this if to edit the notebook JSON document and add a kernelspec object in the metadata first level object.
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "python",
"version": 3
}
}
}
Your error might come from the fact you are using a cleaner to get read out of notebook outputs like nbstripout python package. If that's the case, take care changing nbstripout settings following the documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With