I'm working on some python code that extracts some image data from an ECW file using GDAL (http://www.gdal.org/) and its python bindings. GDAL was built from source to have ECW support.
The program is run on a cluster server that I ssh into. I have tested the program through the ssh terminal and it runs fine. However, I would now like to submit a job to the cluster using qsub, but it reports the following:
Traceback (most recent call last):
File "./gdal-test.py", line 5, in <module>
from osgeo import gdal
File "/home/h3/ctargett/.local/lib/python2.6/site-packages/GDAL-1.11.1-py2.6-linux-x86_64.egg/osgeo/__init__.py", line 21, in <module>
_gdal = swig_import_helper()
File "/home/h3/ctargett/.local/lib/python2.6/site-packages/GDAL-1.11.1-py2.6-linux-x86_64.egg/osgeo/__init__.py", line 17, in swig_import_helper
_mod = imp.load_module('_gdal', fp, pathname, description)
ImportError: /mnt/aeropix/prgs/.local/lib/libgdal.so.1: undefined symbol: H5Eset_auto2
I did a bit more digging and tried using LD_DEBUG=symbols
to try and work out where the difference was, but that's about as far as my knowledge/understanding has got me.
For reference, here's what happens with LD_DEBUG=symbols
and running the code in the ssh terminal (piping through grep H5Eset_auto2
to reduce some of the output):
Symbol debug output for code running in ssh terminal:
11359: symbol=H5Eset_auto2; lookup in file=/usr/bin/python26 [0]
11359: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libpython2.6.so.1.0 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libpthread.so.0 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libdl.so.2 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libutil.so.1 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libm.so.6 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libc.so.6 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
11359: symbol=H5Eset_auto2; lookup in file=/home/h3/ctargett/.local/lib/python2.6/site-packages/GDAL-1.11.1-py2.6-linux-x86_64.egg/osgeo/_gdal.so [0]
11359: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libpython2.6.so.1.0 [0]
11359: symbol=H5Eset_auto2; lookup in file=/mnt/aeropix/prgs/.local/lib/libgdal.so.1 [0]
11359: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libstdc++.so.6 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libm.so.6 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libgcc_s.so.1 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libpthread.so.0 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libc.so.6 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libdl.so.2 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libutil.so.1 [0]
11359: symbol=H5Eset_auto2; lookup in file=/mnt/aeropix/prgs/.local/lib/libhdf5.so.7 [0]
11359: symbol=H5Eset_auto2; lookup in file=/usr/bin/python26 [0]
11359: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libpython2.6.so.1.0 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libpthread.so.0 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libdl.so.2 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libutil.so.1 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libm.so.6 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libc.so.6 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
11359: symbol=H5Eset_auto2; lookup in file=/home/h3/ctargett/.local/lib/python2.6/site-packages/GDAL-1.11.1-py2.6-linux-x86_64.egg/osgeo/_gdal.so [0]
11359: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libpython2.6.so.1.0 [0]
11359: symbol=H5Eset_auto2; lookup in file=/mnt/aeropix/prgs/.local/lib/libgdal.so.1 [0]
11359: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libstdc++.so.6 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libm.so.6 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libgcc_s.so.1 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libpthread.so.0 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libc.so.6 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libdl.so.2 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libutil.so.1 [0]
11359: symbol=H5Eset_auto2; lookup in file=/mnt/aeropix/prgs/.local/lib/libhdf5.so.7 [0]
Symbol debug output for code submitted using qsub:
16915: symbol=H5Eset_auto2; lookup in file=/usr/bin/python26 [0]
16915: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libpython2.6.so.1.0 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libpthread.so.0 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libdl.so.2 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libutil.so.1 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libm.so.6 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libc.so.6 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
16915: symbol=H5Eset_auto2; lookup in file=/home/h3/ctargett/.local/lib/python2.6/site-packages/GDAL-1.11.1-py2.6-linux-x86_64.egg/osgeo/_gdal.so [0]
16915: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libpython2.6.so.1.0 [0]
16915: symbol=H5Eset_auto2; lookup in file=/mnt/aeropix/prgs/.local/lib/libgdal.so.1 [0]
16915: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libstdc++.so.6 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libm.so.6 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libgcc_s.so.1 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libpthread.so.0 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libc.so.6 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libdl.so.2 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libutil.so.1 [0]
16915: symbol=H5Eset_auto2; lookup in file=/mnt/aeropix/prgs/.local/lib/libhdf5.so.7 [0]
16915: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libjpeg.so.62 [0]
16915: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libpng12.so.0 [0]
16915: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libpq.so.4 [0]
16915: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libcurl.so.3 [0]
16915: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libgssapi_krb5.so.2 [0]
16915: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libkrb5.so.3 [0]
16915: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libk5crypto.so.3 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libcom_err.so.2 [0]
16915: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libidn.so.11 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libssl.so.6 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libcrypto.so.6 [0]
16915: symbol=H5Eset_auto2; lookup in file=/mnt/aeropix/prgs/.local/lib/libNCSEcw.so.0 [0]
16915: symbol=H5Eset_auto2; lookup in file=/mnt/aeropix/prgs/.local/lib/libNCSEcwC.so.0 [0]
16915: symbol=H5Eset_auto2; lookup in file=/mnt/aeropix/prgs/.local/lib/libNCSCnet.so.0 [0]
16915: symbol=H5Eset_auto2; lookup in file=/mnt/aeropix/prgs/.local/lib/libNCSUtil.so.0 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/librt.so.1 [0]
16915: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libxml2.so.2 [0]
16915: symbol=H5Eset_auto2; lookup in file=/mnt/aeropix/prgs/.local/lib/libz.so.1 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libcrypt.so.1 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libresolv.so.2 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libnsl.so.1 [0]
16915: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libkrb5support.so.0 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libkeyutils.so.1 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libselinux.so.1 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libsepol.so.1 [0]
16915: /mnt/aeropix/prgs/.local/lib/libgdal.so.1: error: symbol lookup error: undefined symbol: H5Eset_auto2 (fatal)
ImportError: /mnt/aeropix/prgs/.local/lib/libgdal.so.1: undefined symbol: H5Eset_auto2
I guess I'm not sure why it seems to stop looking in libgdal.so.1 when submitted using qsub, when it continues to look when just run in the terminal. I also note that the qsub job is able to correctly locate libhdf5.so.7
(which is where it should find H5Eset_auto2
) as it can find a different symbol, H5Eprint
:
16915: symbol=H5Eprint; lookup in file=/usr/lib64/libpython2.6.so.1.0 [0]
16915: symbol=H5Eprint; lookup in file=/mnt/aeropix/prgs/.local/lib/libgdal.so.1 [0]
16915: symbol=H5Eprint; lookup in file=/usr/lib64/libstdc++.so.6 [0]
16915: symbol=H5Eprint; lookup in file=/lib64/libm.so.6 [0]
16915: symbol=H5Eprint; lookup in file=/lib64/libgcc_s.so.1 [0]
16915: symbol=H5Eprint; lookup in file=/lib64/libpthread.so.0 [0]
16915: symbol=H5Eprint; lookup in file=/lib64/libc.so.6 [0]
16915: symbol=H5Eprint; lookup in file=/lib64/libdl.so.2 [0]
16915: symbol=H5Eprint; lookup in file=/lib64/libutil.so.1 [0]
16915: symbol=H5Eprint; lookup in file=/mnt/aeropix/prgs/.local/lib/libhdf5.so.7 [0]
Any pointers on this would be incredibly useful at this stage (I hope that's enough information - I'm more than happy to provide more information, I'm just not sure what else might be useful at this stage).
EDIT:
It seems that the contents of /usr/bin
are different for jobs submitted using qsub
(specifically libtool
is missing). This is being investigated.
The error message above occurs when you have an incorrect LD_LIBRARY_PATH setting. When a Platform MPI application starts, it will search for a dynamic library called libmpi.so. Note that this is a common library name and that many MPI distributions use this name.
A symbol remains undefined when a symbol reference in a relocatable object is never matched to a symbol definition. Similarly, if a shared object is used to create a dynamic executable and leaves an unresolved symbol definition, an undefined symbol error results.
After two dozens of comments to understand the situation, it was found that the libhdf5.so.7
was actually a symlink (with several levels of indirection) to a file that was not shared between the queued processes and the interactive processes. This means even though the symlink itself lies on a shared filesystem, the contents of the file do not and as a result the process was seeing different versions of the library.
For future reference: other than checking LD_LIBRARY_PATH
, it's always a good idea to check a library with nm -D
to see if the symbols actually exist. In this case it was found that they do exist in interactive mode but not when run in the queue. A quick md5sum
revealed that the files were actually different.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With