Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Undefined symbols when using user operator in tensorflow-gpu>=1.15

everybody. I wrote some user operators to extend tensorflow and tried to use CMake to compile the code to different shared libraries to fit different versions of tensorflow.

It works fine with tensorflow-gpu<=1.14 but not with 1.15 and 2.0. I got the following error when loading the library.

tensorflow.python.framework.errors_impl.NotFoundError: build/lib/libtensorflow_ctext.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

I tried nm build/lib/libtensorflow_ctext.so on 1.14 version and 2.0 version, both shared libraries have this undefined symbol in the middle.

U _ZN10tensorflow12OpDefBuilder4AttrENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

It seems that the program is going to find this symbol in the linked Tensorflow framework library libtensorflow_framework.so. I searched libtensorflow_framework.so.2 for similar symbols and found several of them.

0000000000cacc50 T _ZN10tensorflow12OpDefBuilder10DeprecatedEiSs
0000000000cace00 T _ZN10tensorflow12OpDefBuilder10SetShapeFnESt8functionIFNS_6StatusEPNS_15shape_inference16InferenceContextEEE
0000000000cacb20 T _ZN10tensorflow12OpDefBuilder13ControlOutputESs
0000000000cac980 T _ZN10tensorflow12OpDefBuilder13SetIsStatefulEv
0000000000cac970 T _ZN10tensorflow12OpDefBuilder14SetIsAggregateEv
0000000000cac960 T _ZN10tensorflow12OpDefBuilder16SetIsCommutativeEv
0000000000cac990 T _ZN10tensorflow12OpDefBuilder27SetAllowsUninitializedInputEv
0000000000cacb50 T _ZN10tensorflow12OpDefBuilder3DocESs
0000000000caca90 T _ZN10tensorflow12OpDefBuilder4AttrESs
0000000000cacac0 T _ZN10tensorflow12OpDefBuilder5InputESs
0000000000cacaf0 T _ZN10tensorflow12OpDefBuilder6OutputESs
0000000000cac830 T _ZN10tensorflow12OpDefBuilderC1ESs
0000000000cac830 T _ZN10tensorflow12OpDefBuilderC2ESs
0000000000c702d0 W _ZN10tensorflow12OpDefBuilderD1Ev
0000000000c702d0 W _ZN10tensorflow12OpDefBuilderD2Ev

The symbol _ZN10tensorflow12OpDefBuilder4AttrESs looks very similar but different in the last several letters. I don't really know what those "ESs"s and "ENSt7"s stand for.

Hints on how I could debug it are very appreciated. Here is the command to build my shared library (generated by cmake)

g++ -fPIC   -shared -Wl,-soname,libtensorflow_ctext.so -o lib/libtensorflow_ctext.so src/CMakeFiles/bp_par_2d.dir/bp_par_2d.cc.o src/CMakeFiles/bp_par_2d_sv.dir/bp_par_2d_sv.cc.o src/CMakeFiles/fp_par_2d.dir/fp_par_2d.cc.o src/CMakeFiles/filter.dir/filter.cc.o cuda/CMakeFiles/bp_par_2d_cu.dir/bp_par_2d.cu.o cuda/CMakeFiles/bp_par_2d_sv_cu.dir/bp_par_2d_sv.cu.o cuda/CMakeFiles/fp_par_2d_cu.dir/fp_par_2d.cu.o cuda/CMakeFiles/filter_cu.dir/filter.cu.o tensorflow/CMakeFiles/bp_par_2d_ops.dir/bp_par_2d_ops.cu.o tensorflow/CMakeFiles/bp_par_2d_sv_ops.dir/bp_par_2d_sv_ops.cu.o tensorflow/CMakeFiles/fp_par_2d_ops.dir/fp_par_2d_ops.cu.o tensorflow/CMakeFiles/ramp_filter_ops.dir/ramp_filter_ops.cu.o CMakeFiles/tensorflow_ctext.dir/cmake_device_link.o  -L/usr/lib/x86_64-linux-gnu/stubs -Wl,-rpath,/home/ltl/anaconda3/envs/tf_test/lib/python3.7/site-packages/tensorflow_core /home/ltl/anaconda3/envs/tf_test/lib/python3.7/site-packages/tensorflow_core/libtensorflow_framework.so.2 -lcudadevrt -lcudart_static -lrt -lpthread -ldl 
like image 381
Tianling Lv Avatar asked Dec 06 '19 00:12

Tianling Lv


Video Answer


1 Answers

Well, this problem is solved.

I used nm -C instruction to look inside the .so files and found that in Tensorflow>=1.15.0, the function is defined as

0000000000caca90 T tensorflow::OpDefBuilder::Attr(std::string)

while in Tensorflow<=1.14.0, the function is defined as

0000000000c96ed0 T tensorflow::OpDefBuilder::Attr(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)

So, they use different settings on _GLIBCXX_USE_CXX11_ABI when compiling the shared library.

In order to be consistant and avoid those undefined symbol problems, I need to define -D_GLIBCXX_USE_CXX11_ABI=1 for early versions of Tensorflow and define -D_GLIBCXX_USE_CXX11_ABI=0 for later versions.

like image 153
Tianling Lv Avatar answered Oct 27 '22 00:10

Tianling Lv