Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Breaking strict aliasing and living to tell about it?

I am trying to use two libraries, LIBSVM and LIBLINEAR in the same application that I am writing in C++11. Both LIBSVM and LIBLINEAR take their input in what is essentially a row-based sparse matrix representation: there is node structure

struct svm_node
{
    int index;
    double value;
};

and the sparse matrix itself is just struct svm_node **, where every row is a struct svm_node *, and rows are terminated by index = -1. The LIBLINEAR version of this struct is called feature_node and has identical definition. Although LIBSVM and LIBLINEAR are written by the same authors, svm.h and linear.h, and consequently struct svm_node and struct feature_node are in no way related.

There are some cases where I would like to create a kernel SVM model (implemented by LIBSVM only) and a logistic regression model (implemented by LIBLINEAR) only) of my data. The data set, which is passed to the libraries in their respective---on the binary level, identical---sparse matrix representation, may be quite large and I would prefer to avoid memcpy()ing it all. A simple reinterpret_cast<feature_node **>(svm_node_ptr_ptr_variable) seems to do the job just fine.

I am also using LLVM's full-program optimization (-flto) in release builds, so I would like to ensure no optimization breaks by code in an unpredictable manner.

Is there any way type-pun svm_node ** into feature_node ** that avoids any breakage which may be caused by (current or future) compiler optimizations? Does __attribute__((__may_alias__)) help here, and if it does, how should I use it?


If __attribute__((__may_alias__)) is only meaningful on types, would it work if I created my own struct and pointer-to-struct

struct __attribute__((__may_alias__)) SparseElement {
    int index;
    double value;
};
typedef SparseRow SparseElement * __attribute__((__may_alias__));

and then passed a retinterpret_casted SparseRow * to LIBSVM and LIBLINEAR?

like image 612
Kristóf Marussy Avatar asked Aug 05 '14 10:08

Kristóf Marussy


1 Answers

The LIBLINEAR version of this struct is called feature_node and has identical definition.

You're golden if you use a union. C++ specifically allows (section 9.2) accessing "a common initial subsequence".

If a standard-layout union contains two or more standard-layout structs that share a common initial sequence, and if the standard-layout union object currently contains one of these standard-layout structs, it is permitted to inspect the common initial part of any of them. Two standard-layout structs share a common initial sequence if corresponding members have layout-compatible types and either neither member is a bit-field or both are bit-fields with the same width for a sequence of one or more initial members.

Even a reinterpret_cast on the pointer should work fine, since the type that undergoes lvalue to rvalue conversion is the exact type of the object that exists in memory there.

like image 67
Ben Voigt Avatar answered Sep 28 '22 08:09

Ben Voigt