How can I parse a UTF8 string from a ReadOnlySequence
ReadOnlySequence is made of parts, and seeing as UTF8 characters are variable length the break in the parts could be in the middle of a character . So simply using Encoding.UTF8.GetString() on the parts and combining them in a StringBuilder will not work.
Is it possible to parse a UTF8 string from a ReadOnlySequence without first combining them into an array. I would prefer to avoid a memory allocation here.
The first thing we should do here is test whether the sequence actually is a single span; if it is, we can hugely simplify and optimize.
Once we know that we have a multi-segment (discontiguous) buffer, there are two ways we can go:
GetDecoder()
API on the encoding, and use that to populate a new string, which on older frameworks means overwriting a newly allocated string, or in newer frameworks means using the string.Create
APIThe first option is massively simpler, but involves a few memory-copy operations (but no additional allocations other than the string):
public static string GetString(in this ReadOnlySequence<byte> payload,
Encoding encoding = null)
{
encoding ??= Encoding.UTF8;
return payload.IsSingleSegment ? encoding.GetString(payload.FirstSpan)
: GetStringSlow(payload, encoding);
static string GetStringSlow(in ReadOnlySequence<byte> payload, Encoding encoding)
{
// linearize
int length = checked((int)payload.Length);
var oversized = ArrayPool<byte>.Shared.Rent(length);
try
{
payload.CopyTo(oversized);
return encoding.GetString(oversized, 0, length);
}
finally
{
ArrayPool<byte>.Shared.Return(oversized);
}
}
}
It seems that .NET 5.0 introduced EncodingExtensions.GetString
to solve this problem.
Decodes the specified ReadOnlySequence into a String using the specified Encoding.
using System.Text;
string message = EncodingExtensions.GetString(Encoding.UTF8, buffer);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With