Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

store non-nul terminated C string constant in C++

Before anyone says, "DON'T DO THIS as it is really bad".

  1. I understand the reasons for having a NUL terminated string.
  2. I know one can state something like
    char mystr[] = { 'm', 'y', ' ', 's', 't', 'r', 'i', 'n', 'g'};
    However, the convenience of the c-string representation is too great.

The rational for this is that I'm programming for a micro-controller and I need to store data into the programme's memory. Some of the data is in the form of bytes, words, dwords and floats. I'd like the data to include strings without the NUL contiguously.

I've tried templates that take <size_t N, char* A> and <size_t N, char (&A)[N]> as parameters in order to traverse the array and store its contents to a static array, but I can't seem to get it right. I think the standard may actually disallow this which is understandable in the general case, but unfortunate in specific cases (specifically, this one. ;) :( )

If I could remap the string as something like a boost::mpl::vector_c<char, ...> template, that would be better as I have other code that will store it properly, but dereferencing an array from within a template to be used as a const template parameter appears to be disallowed too.

Any ideas?

EDIT:

Psudocode example (this is kinda contrived as the real code is much larger, also I wouldn't probably read byte by byte like this, nor would I be using a literal to iterate to the end of the string. That would be embedded in the data as well somewhere.):

// this stores bytes in an array
template<typename X, typename T, T ...numbers>
struct x
{
  static PROGMEM volatile const T data[];
};
template<typename X, typename T, T ...numbers>
PROGMEM volatile const T x<X, T, numbers...>::data[] = { numbers... };

void main()
{
  // this will not work, but the idea is you have byte 0 as 1, 
  // byte 1 as 2 byte 2 as 3 byte 3 as 's', byte 4 as 'o'...
  // byte 22 as 'g', byte 23 as 4, byte 24 as 5, byte 25 as 6.
  typedef x<int, char, 1,2,3,"some embedded string",4,5,6> xx;
  for(i=0; i<20; ++i)
    Serial.print(pgm_read_byte_near(&xx::data[0] + 3));
}

Also note that I am not using C++11, this is C++0x, and possibly an extension.

like image 745
Adrian Avatar asked May 10 '13 13:05

Adrian


People also ask

What happens if a string is not null-terminated in C?

Many library functions accept a string or wide string argument with the constraint that the string they receive is properly null-terminated. Passing a character sequence or wide character sequence that is not null-terminated to such a function can result in accessing memory that is outside the bounds of the object.

Are string constants null-terminated in C?

String constantsA string constant such as "some text" is a null-terminated string. So it is an array of characters, with a null character at the end.

Do C strings need a null terminator?

Strings are actually one-dimensional array of characters terminated by a null character '\0'.

Are strings always null-terminated?

Yes, all string in C are represented by string 0 terminated.


2 Answers

Third try

magic and trickery

If you were using C++11 (I know, but in its absence I think code generation is your best bet), it feels like a user-defined literal should be able to handle this. Eg, with:

template <char... RAW>
inline constexpr std::array<char, sizeof...(RAW)> operator "" _fixed() {
    return std::array<char, sizeof...(RAW)>{RAW...};
}

it would be nice if this worked:

const std::array<char, 7> goodbye = goodbye_fixed;

... but sadly it doesn't (the literal needs to be numeric, presumably for parsing reasons). Using "goodbye"_fixed doesn't work either, as that requires an operator "" _fixed(const char *s, int length) overload and the compile-time array has decayed to a pointer again.

Eventually we come down to invoking this:

const auto goodbye = operator "" _FS <'g','o','o','d','b','y','e'>();

and it's no better than the ugly first version. Any other ideas?


Second try

auto-generate the ugliness

I think you're right that you can't easily intercept the string literal mechanism. Honestly, the usual approach would be to use a build tool to generate the ugly code for you in a separate file (cf. internationalization libraries, for example).

Eg, you type

fixed_string hello = "hello";

or something similar in a dedicated file, and the build system generates a header

const std::array<char, 5> hello;

and a cpp with the ugly initialization from above below.


First try

missed the "looks like a string literal" requirement

I've tried templates ...

like this?

#include <array>
const std::array<char, 5> hello = { 'h', 'e', 'l', 'l', 'o' };

#include <cstdio>
int main()
{
    return std::printf("%.*s\n", hello.size(), &hello.front());
}

If you don't have C++11, Boost.Array will work, or you can roll your own. Note that this is just a type wrapper around const char[5], so should be ok to go in the data segment (I've confirmed it goes in .rodata with my local gcc).

like image 78
Useless Avatar answered Sep 28 '22 16:09

Useless


I actually lost track of this Q and I don't know if I can find the original code I was working with back then, but I have figured out how to store a string without its terminating NUL character.

In c++17 I was able to fill a constexpr std::array<char, n> with a string of characters that doesn't contain the trailing zero.

#include <array>
#include <cstdio>

constexpr size_t str_len(char const * x)
{
    char const * begin = x;
    while (*x) {
        ++x;
    }
    return x - begin;
}

constexpr auto var = "hello there";

template <size_t I, size_t Max>
constexpr auto fn()
{
    // Although I did this recursively, this could have also been done iteratively.
    if constexpr (I < Max) {
        auto x = fn<I + 1, Max>();
        x[I] = var[I];
        return x;
    }
    else {
        return std::array<char, Max>{};
    }
}

int main()
{
    auto x = fn<0, str_len(var)>();
    printf("'%*.*s'\n", x.size(), x.size(), x.data());
    return 0;
}

This give the following assembly:

.LC0:
  .string "'%*.*s'\n"
main:
  sub rsp, 24
  mov edx, 11
  mov esi, 11
  movabs rax, 7526676540175443304 ; <<< hello there
  mov QWORD PTR [rsp+5], rax
  mov eax, 29285
  lea rcx, [rsp+5]
  mov edi, OFFSET FLAT:.LC0
  mov WORD PTR [rsp+13], ax
  xor eax, eax
  mov BYTE PTR [rsp+15], 101
  call printf
  xor eax, eax
  add rsp, 24
  ret

Yes, 7526676540175443304 is "hello there" without any terminating NUL character. 😂 See Demo.

Putting the first line in main() into the global space will result in the string to be located in the global .text segment.

.LC0:
  .string "'%*.*s'\n"
main:
  sub rsp, 8
  mov ecx, OFFSET FLAT:x
  mov edx, 11
  xor eax, eax
  mov esi, 11
  mov edi, OFFSET FLAT:.LC0
  call printf
  xor eax, eax
  add rsp, 8
  ret
x:           ; <<< hello there
  .byte 104
  .byte 101
  .byte 108
  .byte 108
  .byte 111
  .byte 32
  .byte 116
  .byte 104
  .byte 101
  .byte 114
  .byte 101

Demo

I can put it into a type as well:

template <char x, typename...Ts>
struct X
{
};

constexpr int str_len(char const * x)
{
    char const * begin = x;
    while (*x) {
        ++x;
    }
    return x - begin;
}

constexpr auto var = "hello there";

template <int I>
constexpr auto fn()
{
    if constexpr (I - 1 != 0)
        return X<var[str_len(var) - I], decltype(fn<I - 1>())>{};
    else
        return X<var[str_len(var) - I], void>{};
}

int main()
{
    decltype(nullptr)(fn<str_len(var)>());
    return 0;
}

Which gives me the output:

<source>:28:5: error: cannot convert 'X<'h', X<'e', X<'l', X<'l', X<'o', X<' ', X<'t', X<'h', X<'e', X<'r', X<'e', void> > > > > > > > > > >' to 'decltype(nullptr)' (aka 'nullptr_t') without a conversion operator
    decltype(nullptr)(fn<str_len(var)>());
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Demo

Now I can prolly massage this more to put it into the state I asked for above. The requirement was to store the string as not NULL terminated but also to do this in c++0x, which this isn't, so I won't be marking this as an answer. But I thought I'd put it out there.

Edit

Seems that gnu and clang also have an extension that allows for putting the string into a template type:

template <char...Cs>
struct chars {};

template <typename T, T...Xs>
chars<Xs...> operator""_xxx() {
    return {};
}

int main()
{
    decltype(nullptr)("hello there"_xxx);
    return 0;
}

Which spits out:

<source>:5:14: warning: string literal operator templates are a GNU extension [-Wgnu-string-literal-operator-template]
chars<Xs...> operator""_xxx() {
             ^
<source>:11:5: error: cannot convert 'chars<'h', 'e', 'l', 'l', 'o', ' ', 't', 'h', 'e', 'r', 'e'>' to 'decltype(nullptr)' (aka 'nullptr_t') without a conversion operator
    decltype(nullptr)("hello there"_xxx);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Demo

Note that the only reason I can now think of to put a string into a template argument is to transfer a string as a constexpr, which could have some interesting reasons for it, such as allowing the morphing of the return type of said constexpr function based on the string passed. Which has some interesting possibilities.

Additional note: It isn't possible to pass a string directly to a constexpr function and have it morph the return type because, as a parameter, it's no longer constexpr, which is a bit annoying. The only way to manipulate a constexpr string and morph the return type is to declare it external to the function as constexpr and then reference that external constexpr variable from within the function, like as shown in my second example.

Edit 2

Turns out that although you can't directly pass something as a constexpr value, you can pass a lambda which will work as a constexpr function.

#include <array>
#include <cstdio>

constexpr size_t str_len(char const * x)
{
    char const * begin = x;
    while (*x) {
        ++x;
    }
    return x - begin;
}

template <size_t I = 0, typename FN>
constexpr auto fn2(FN str) {
    constexpr auto Max = str_len(str());
    if constexpr (I < Max) {
        auto x = fn2<I + 1>(str);
        x[I] = str()[I];
        return x;
    }
    else {
        return std::array<char, Max>{};
    }
}

auto x = fn2<>([]{ return "hello there"; });

int main()
{
    printf("'%*.*s'\n", x.size(), x.size(), x.data());
    return 0;
}

Which results in the same asm output as my first example. Demo

I'm frankly surprised that actually works.

Edit 3

Given that I have figured out how to pass a constexpr string, I can now create a non-recursive type:

#include <utility>

constexpr std::size_t str_len(char const * x)
{
    char const * begin = x;
    while (*x) {
        ++x;
    }
    return x - begin;
}

template <char...> struct c{};

template <typename FN, std::size_t...Is>
constexpr auto string_to_type_impl(FN str, std::index_sequence<Is...>)
{
    return c<str()[Is]...>{};
}

template <typename FN>
constexpr auto string_to_type(FN str)
{
    constexpr auto Max = str_len(str());
    return string_to_type_impl(str, std::make_index_sequence<Max>{});
}

int main()
{
    std::nullptr_t(string_to_type([]{ return "hello there"; }));
    return 0;
}

With the resulting output:

<source>:29:5: error: cannot convert 'c<'h', 'e', 'l', 'l', 'o', ' ', 't', 'h', 'e', 'r', 'e'>' to 'std::nullptr_t' (aka 'nullptr_t') without a conversion operator
    std::nullptr_t(string_to_type([]{ return "hello there"; }));
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.

Demo

Of course, for these work with c++11, the constexpr functions would have to be converted to recursive ternary versions.

like image 35
Adrian Avatar answered Sep 28 '22 17:09

Adrian