Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running Python startup code after modules are loaded

I'm working with Jupyter notebooks and Python kernels with a SparkContext. A coworker has written some Python code that wires Spark events with ipykernel events. When we import his module from a notebook cell, it works in all combinations we need to support: Python 2.7 and 3.5, Spark 1.6 and 2.x, Linux only.

Now we want to enable that code automatically for all Python kernels. I put the import into our sitecustomize.py. That works fine for Spark 2.x, but not for Spark 1.6. Kernels with Spark 1.6 don't get an sc anymore, and something is so screwed up that unrelated imports like matplotlib.cbook fail. When I delay that import for a few seconds using a timer, it works. Apparently, the code in sitecustomize.py is executed too early for importing the module which connects Spark with the ipykernel.

I'm looking for a way to delay that import until Spark and/or ipykernel are fully initialized. But it should still execute as part of the kernel startup, before any notebook cells get executed. I found this trick to delay code execution until sys.argv is initialized. But I don't think it can work on global variables like sc, considering that Python globals are still local to modules. So far, the best I can come up with is using a timer to check every second whether certain modules are present in sys.modules. But that isn't very reliable, because I don't know how to distinguish a module that's fully initialized from one that's still in the process of being loaded.

Any ideas on how to hook in startup code that executes late during startup? A solution that is specific to pyspark and/or ipykernel would satisfy my needs.

like image 300
Roland Weber Avatar asked Mar 29 '17 12:03

Roland Weber


People also ask

How do I load all Python modules at startup?

Check the file ~/.ipython/ipythonrc - you can list all modules you want to load at the startup. Have a .pythonstartup in your home directory and load modules there and point PYTHONSTARTUP env to that file. Python commands in that file are executed before the first prompt is displayed in interactive mode.

Why does Python import a module and then run the code?

This happens because when Python imports a module, it runs all the code in that module . After running the module it takes whatever variables were defined in that module, and it puts them on the module object, which in our case is salutations. >>> salutations.greet() Hiya!

How do I run a Python module from the command line?

Running Modules With the -m Option. Python offers a series of command-line options that you can use according to your needs. For example, if you want to run a Python module, you can use the command python -m <module-name>. The -m option searches sys.path for the module name and runs its content as __main__: $

How to add a python script to Windows Start-up?

Adding a Python script to windows start-up basically means the python script will run as the windows boots up. This can be done by two step processe – Step #1: Adding script to windows Startup folder After the windows boots up it runs (equivalent to double-clicking) all the application present in its startup directory. Address:


1 Answers

Hmmm, you don't really give many details about what errors you encounter.

I think the canonical way to customize startup behaviour for the ipython kernel is to setup a config file and set the exec_lines option.

For example you would put in ~/.ipython/profile_default/ipython_config.py

# sample ipython_config.py
c = get_config()

c.InteractiveShellApp.exec_lines = [
    'import numpy',
    'import scipy'
]
c.InteractiveShellApp.exec_files = [
    'mycode.py',
    'fancy.ipy'
]
like image 109
Giannis Spiliopoulos Avatar answered Oct 05 '22 23:10

Giannis Spiliopoulos