String to byte array in UTF-8?

How to convert a WideString (or other long string) to byte array in UTF-8?

2 Answers

A function like this will do what you need:

function UTF8Bytes(const s: UTF8String): TBytes;
  SetLength(Result, Length(s));
  if Length(Result)>0 then
    Move(s[1], Result[0], Length(s));

You can call it with any type of string and the RTL will convert from the encoding of the string that is passed to UTF-8. So don't be tricked into thinking you must convert to UTF-8 before calling, just pass in any string and let the RTL do the work.

After that it's a fairly standard array copy. Note the assertion that explicitly calls out the assumption on string element size for a UTF-8 encoded string.

If you want to get the zero-terminator you would write it so:

function UTF8Bytes(const s: UTF8String): TBytes;
  SetLength(Result, Length(s)+1);
  if Length(Result)>0 then
    Move(s[1], Result[0], Length(s));
  Result[high(Result)] := 0;
You can use TEncoding.UTF8.GetBytes in SysUtils.pas

