Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compile-time string compression with C++17 and earlier

Tags:

c++

c++17

I have an application that uses strings with long chains of repeated characters. I want to add them to the binary in compressed/obfuscated form. I'm currently using a modified RLE algorithm for simplicity.

I'm using the following algorithm that works with C++20. Unfortunately now I have to support C++17 as well for business reasons. My current solution for C++17 is to put the strings on a YAML file and generate the respective .cpp "compressed" files on build time, which are then linked into the process.

Doing some research I found this solution that works with Huffman but only supports C++20 (and above).

I have also seen this solution but the "compressed" data is the same size as the raw data.

So the question is how can I rewrite the following algorithm with C++17?

#include <cstdint>
#include <algorithm>
#include <iostream>
#include <array>
#include <span>
#include <sstream>

struct Array {
    const char* data;
    std::size_t size;
};

constexpr std::size_t compress( const char* data, std::size_t size, char* buf ) {
        if ( size==0 ) return 0;
        std::size_t offset = 0;
        char lastch = *data;
        std::size_t counter = 0;
        auto push = [&]() {
            if ( counter <= 3 ) {
                for ( int j=0; j<counter; ++j ) buf[offset++] = lastch;
            }
            else {
                buf[offset++] = 0;
                buf[offset++] = lastch;
                buf[offset++] = counter;
            }
            counter = 0;
        };
        lastch = data[0];
        counter = 1;
        for ( std::size_t j=1; j<size; ++j ) {
            if ( (data[j]!=lastch) || (counter==255) ) {
                push(); 
                lastch = data[j];
            }
            counter++;
        }
        push();        
        return offset;
}

template< std::size_t N > 
struct RawContainer {
    char raw_data[N];
    constexpr RawContainer( const char (&s)[N] ) {
        std::copy(s,s+N,raw_data);
    }
    constexpr operator const char* () const noexcept {
        return data;
    }
    constexpr auto data() const noexcept {
        return raw_data;
    }
    constexpr auto size() const noexcept {
        return N;
    }
};

template< auto Container >
struct StringCompressor {
    StringCompressor() noexcept {
        compress(Container.data(),Container.size(),compressed_data.data());
    }
    constexpr static auto build_size() noexcept {
        char out[Container.size()*3];
        return compress(Container.data(),Container.size(),out);
    }
    std::string str() noexcept {
        std::ostringstream out;
        out << compressed_data.size() << ": ";
        for ( std::size_t j=0; j<compressed_data.size(); ++j ) {
            out << (int)compressed_data[j] << " ";
        }
        return out.str();
    }
    std::array<char,build_size()> compressed_data;
};

template<RawContainer str>
constexpr StringCompressor<str> operator ""_x() noexcept
{
    return StringCompressor<str>();
}



auto value = "aaaabbbbbbbbbbbbbbbbbbbc"_x;

int main() {
    std::cout << value.str() << std::endl;
}

Godbolt: https://godbolt.org/z/Eh9fMxW75

Note: the decompression algorithm was not included for simplicity.

like image 571
Fred Helmers Avatar asked Oct 12 '25 23:10

Fred Helmers


1 Answers

Here is my version. The compress function is copied from your code with no change, the rest is lifted from my comment with some trivial modifications. The usagee is simple:

COMPRESSED_LITERAL("aaaabbbbbbbbbbbbbc")

As you can see, neither the original string nor the compression code leave any trace in the generated object.

I didn't implement str() because the compressed string is visible in the generated assembly. I let both compressed and uncompressed versions just output to std::cout just to compare the generated assembly side by side, although of course the compressed one will output nothing because of the embedded zero at the beginning.

The only downside I can think of is that you need to use a macro.

like image 114
n. 1.8e9-where's-my-share m. Avatar answered Oct 14 '25 12:10

n. 1.8e9-where's-my-share m.