Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Disable built-in module import in embedded Python

Tags:

I'm embedding Python 3.6 in my application, and I want to disable import command in the scripts to prevent users to import any python built-in libraries. I'd like to use only the language itself and my own C++ defined modules.

Py_SetProgramName (L"Example");
Py_Initialize ();
PyObject* mainModule = PyImport_AddModule ("__main__");
PyObject* globals = PyModule_GetDict (mainModule);

// This should work
std::string script1 = "print ('example')";
PyRun_String (script1.c_str (), Py_file_input, globals, nullptr);

// This should not work
std::string script2 = "import random\n"
                      "print (random.randint (1, 10))\n";
PyRun_String (script2.c_str (), Py_file_input, globals, nullptr);

Py_Finalize ();

Do you know any way to achieve this?

like image 977
kovacsv Avatar asked Feb 26 '18 15:02

kovacsv


People also ask

Why is Python running my module when I import it?

This happens because when Python imports a module, it runs all the code in that module. After running the module it takes whatever variables were defined in that module, and it puts them on the module object, which in our case is salutations .

What is __ all __ in Python?

In the __init__.py file of a package __all__ is a list of strings with the names of public modules or other objects. Those features are available to wildcard imports. As with modules, __all__ customizes the * when wildcard-importing from the package.

What are the three ways to import modules in Python?

So there's four different ways to import: Import the whole module using its original name: pycon import random. Import specific things from the module: pycon from random import choice, randint. Import the whole module and rename it, usually using a shorter variable name: pycon import pandas as pd.


1 Answers

Python has a long history of being impossible to create a secure sandbox (see How can I sandbox Python in pure Python? as a starting point, then dive into an old python-dev discussion if you feel like it). Here are what I consider to be your best two options.

Pre-scan the code

Before executing anything, scan the code. You could do this in Python with the AST module and then walk the tree, or can likely get far enough with simpler text searches. This likely works in your scenario because you have restricted use cases - it doesn't generalize to truly arbitrary code.

What you are looking for in your case will be any import statements (easy), and any top-level variables (e.g., in a.b.c you care about a and likely a.b for a given a) that are not "approved". This will enable you to fail on any code that isn't clean before running it.

The challenge here is that even trivally obfuscated code will bypass your checks. For example, here are some ways to import modules given other modules or globals that a basic scan for import won't find. You would likely want to restrict direct access to __builtins__, globals, some/most/all names with __double_underscores__ and members of certain types. In an AST, these will unavoidably show up as top-level variable reads or attribute accesses.

getattr(__builtins__, '__imp'+'ort__')('other_module')

globals()['__imp'+'ort__']('other_module')

module.__loader__.__class__(
    "other_module",
    module.__loader__.path + '/../other_module.py'
).load_module()

(I hope it goes somewhat without saying, this is an impossible challenge, and why this approach to sandboxing has never fully succeeded. But it may be good enough, depending on your specific threat model.)

Runtime auditing

If you are in a position to compile your own Python runtime, you might consider using the (currently draft) PEP 551 hooks. (Disclaimer: I am the author of this PEP.) There are draft implementations against the latest 3.7 and 3.6 releases.

In essence, this would let you add hooks for a range of events within Python and determine how to respond. For example, you can listen to all import events and determine whether to allow or fail them at runtime based on exactly which module is being imported, or listen to compile events to manage all runtime compilation. You can do this from Python code (with sys.addaudithook) or C code (with PySys_AddAuditHook).

The Programs/spython.c file in the repo is a fairly thorough example of auditing from C, while doing it from Python looks more like this (taken from my talk about this PEP):

import sys

def prevent_bitly(event, args):
    if event == 'urllib.Request' and '://bit.ly/' in args[0]:
        print(f'WARNING: urlopen({args[0]}) blocked')
        raise RuntimeError('access to bit.ly is not allowed')

sys.addaudithook(prevent_bitly)

The downside of this approach is you need to build and distribute your own version of Python, rather than relying on a system install. However, in general this is a good idea if your application is dependent on embedding as it means you won't have to force users into a specific system configuration.

like image 161
Zooba Avatar answered Sep 19 '22 05:09

Zooba