Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should the order of import statements matter when importing a .so?

I am getting the following import error when trying to load a python module compiled using boost python.

ImportError: /path/to/library/libxml2.so.2: symbol gzopen64, version ZLIB_1.2.3.3 not defined in file libz.so.1 with link time reference

Strangely I don't see this error if that is the non standard module to be imported. i.e If I import other module and then this module, it fails with import error. Not sure what's going wrong or how to debug.

Edit: To exactly show the issue:

$ python -c 'import json, libMYBOOST_PY_LIB' # DOES NOT WORK!!!
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: path/to/xml_library/libxml2.so: symbol gzopen64, version ZLIB_1.2.3.3 not defined in file libz.so.1 with link time reference
$ python -c 'import libMYBOOST_PY_LIB, json' # WORKS NOW!!!
$

Its not just json, few other modules also cause the same issue when imported before my module. eg. urllib2

like image 683
balki Avatar asked Aug 09 '13 21:08

balki


People also ask

How do I order an import statement?

The import statements must follow the package statement. import statements should be sorted with the most fundamental packages first, and grouped with associated packages together and one blank line between groups. The import statement location is enforced by the Java language.

What order should imports be in Python?

Imports should be grouped in the following order: Standard library imports. Related third party imports. Local application/library specific imports.

Where should import statements be placed in a program?

Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.

Which is the correct way to import modules?

To use the module, you have to import it using the import keyword. The function or variables present inside the file can be used in another file by importing the module.


2 Answers

The order of import statements matter.

As documented in the python language reference:

Once the name of the module is known (unless otherwise specified, the term “module” will refer to both packages and modules), searching for the module or package can begin. The first place checked is sys.modules, the cache of all modules that have been imported previously. If the module is found there then it is used in step (2) of import.

Any module can change:

  • sys.modules - the cache of all modules previously imported
  • sys.path - the search path for modules

And they can change import hooks as well:

  • sys.meta_path
  • sys.path_hooks
  • sys.path_importer_cache

The import hooks can provide you the ability to load modules from zip files, any kind of archive files, from the network, etc.


import libMYBOOST_PY_LIB

This statement is going to modify sys.modules for sure, loading its dependencies into the module cache. It may modify sys.path too. It is actually very common for frameworks (e.g. boost, zope, django, requests...) to ship with batteries included / with a copy of the modules they depend on.

  • django ships with json
  • requests ships with urllib3

To see exactly what the library will load you can use:

python -v -c 'import libMYBOOST_PY_LIB'
like image 56
dnozay Avatar answered Sep 21 '22 23:09

dnozay


The problem

The problem is with the operating system. A Linux library (dynamically linked shared object library) can depend on other libraries (which can again depend on other libraries and so on). If these dependent libraries are not correctly resolved, you get the error you describe.

What are shared libraries (so)

You can create a shared library by taking several object files and linking them together. The linker preserves a lot of meta data when creating a shared library:

  1. A relocation table
  2. A list of exported symbols (functions and variables that can be accessed by others)
  3. A list of imported symbols (functions and variables this library uses from other libraries)
  4. A list of file names of other libraries, that can be used to satisfy the imported symbols

When the library is used, the system loads the library, changes the addresses referenced by the relocation table and then tries to find the imported symbols. For these the system first check the libraries already loaded. If this does not satisfy all symbols, it tries to find the file names listed in the library and check, whether a file with that name exists, if it is a valid library and if it exports the symbol needed.

Usually the "system" here is the dynamic loader, which runs in user space, not in kernel space.

How can you check, which libraries are used by a program

You can check the contents of a library with the commend ldd.

If you want to check a running executable, try lsof and filter for *.so and check also /proc/[pid]/maps.

How to debug your problem

In your case, hold the program directly before the library in question is loaded (e.g. insert a read from the console or a sleep command). Then check the currently loaded library. You will find, that in the good case there is already a library loaded that exports the symbol in question. In the error case this library is not loaded and the system will try to load the wrong dependent library in the next step (e.g. a different version of the library, where the needed symbol is missing).

Is the order of the library important

Usually not, but it depends on the details. When the same library is needed in different version or when the system cannot resolve the shared library in all cases, it might become important. Unfortunately these problems are quite hard to debug. On Windows you have the DLL hell, on Linux it's similar with the shared objects. Good luck debugging your problem!

like image 32
stefan.schwetschke Avatar answered Sep 17 '22 23:09

stefan.schwetschke