How to convert a WideString (or other long string) to byte array in UTF-8?

A function like this will do what you need: <pre class="prettyprint"><code>function UTF8Bytes(const s: UTF8String): TBytes; begin Assert(StringElementSize(s)=1); SetLength(Result, Length(s)); if Length(Result)>0 then Move(s[1], Result[0], Length(s)); end; </code></pre> You can call it with any type of string and the RTL will convert from the encoding of the string that is passed to UTF-8. So don't be tricked into thinking you must convert to UTF-8 before calling, just pass in any string and let the RTL do the work. After that it's a fairly standard array copy. Note the assertion that explicitly calls out the assumption on string element size for a UTF-8 encoded string. If you want to get the zero-terminator you would write it so: <pre class="prettyprint"><code>function UTF8Bytes(const s: UTF8String): TBytes; begin Assert(StringElementSize(s)=1); SetLength(Result, Length(s)+1); if Length(Result)>0 then Move(s[1], Result[0], Length(s)); Result[high(Result)] := 0; end; </code></pre>

You can use <code>TEncoding.UTF8.GetBytes</code> in SysUtils.pas

String to byte array in UTF-8?

2 Answers

A function like this will do what you need:

function UTF8Bytes(const s: UTF8String): TBytes;
begin
  Assert(StringElementSize(s)=1);
  SetLength(Result, Length(s));
  if Length(Result)>0 then
    Move(s[1], Result[0], Length(s));
end;

You can call it with any type of string and the RTL will convert from the encoding of the string that is passed to UTF-8. So don't be tricked into thinking you must convert to UTF-8 before calling, just pass in any string and let the RTL do the work.

After that it's a fairly standard array copy. Note the assertion that explicitly calls out the assumption on string element size for a UTF-8 encoded string.

If you want to get the zero-terminator you would write it so:

function UTF8Bytes(const s: UTF8String): TBytes;
begin
  Assert(StringElementSize(s)=1);
  SetLength(Result, Length(s)+1);
  if Length(Result)>0 then
    Move(s[1], Result[0], Length(s));
  Result[high(Result)] := 0;
end;

151

answered Sep 30 '22 16:09

David Heffernan

You can use TEncoding.UTF8.GetBytes in SysUtils.pas

answered Sep 30 '22 16:09

Mikael Eriksson

Related questions
                            
                                Extract the first letter of a UTF-8 string with Lua
                            
                                What does this preg_replace do? (/[\xF0-\xF7].../)
                            
                                Java PDFBox - Reading and modifying a pdf with special characters (diacritics)
                            
                                How to decode utf-8 from REST api in Dart code?
                            
                                mb_strlen() is it enough?
                            
                                How can I convert "Western (Mac OS Roman)" formatted text to UTF-8 with PHP?
                            
                                Java UTF-8 differences
                            
                                How to parse UTF-8 characters in Excel files using POI
                            
                                python regular expression with utf8 issue
                            
                                JAVA Http POST request in UTF-8
                            
                                Encode/Decode HttpPost UTF-8 Java
                            
                                perl: convert a string to utf-8 for json decode
                            
                                php reading mysql bit field returning weird character
                            
                                Is replacing a line break UTF-8 safe?
                            
                                UTF-16 to UTF-8 conversion in JavaScript
                            
                                Django UnicodeEncodeError in rendering form ('utf-8')
                            
                                How to create a database with UTF-8 collation in PostgreSQL on Windows?
                            
                                Is UTF-8 enough for all common languages?
                            
                                "an integer is required" when open()'ing a file as utf-8?
                            
                                Convert integer array to string at javascript

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

String to byte array in UTF-8?

Tags:

utf-8

freepascal

lazarus

Mariusz

People also ask

2 Answers

David Heffernan

Mikael Eriksson

Recent Activity

Donate For Us