Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to concatenate ReadOnlySpan<char> in C#

What's the most efficient way to concatenate strings, if I already only have ReadOnlySpan slices?

Simplified example:

public class Program {
    public string ConcatSpans(string longstring) {
        var span = longstring.AsSpan();
        var sb = new StringBuilder(longstring.Length);
        sb.Append(span.Slice(40, 10));
        sb.Append(span.Slice(30, 10));
        sb.Append(span.Slice(20, 10));
        sb.Append(span.Slice(10, 10));
        sb.Append(span.Slice(0, 10));
        return sb.ToString();
    }

    [Benchmark]
    public void ConcatSpansBenchmark() {
        ConcatSpans("aaaaaaaaaabbbbbbbbbbccccccccccddddddddddeeeeeeeeee");
    }

    public static void Main(string[] args) {
        var summary = BenchmarkRunner.Run<Program>();
    }
}

Results:

BenchmarkDotNet=v0.11.2, OS=Windows 10.0.17134.345 (1803/April2018Update/Redstone4)
Intel Core i5-2500K CPU 3.30GHz (Sandy Bridge), 1 CPU, 4 logical and 4 physical cores
.NET Core SDK=2.1.403
  [Host]     : .NET Core 2.1.5 (CoreCLR 4.6.26919.02, CoreFX 4.6.26919.02), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.5 (CoreCLR 4.6.26919.02, CoreFX 4.6.26919.02), 64bit RyuJIT


               Method |     Mean |    Error |   StdDev | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
--------------------- |---------:|---------:|---------:|------------:|------------:|------------:|--------------------:|
 ConcatSpansBenchmark | 126.6 ns | 1.712 ns | 1.601 ns |      0.0966 |           - |           - |               304 B |

Is StringBuilder really the best we can do? Is there a way to go faster than that? With even less allocations? After all StringBuilder object itself is a heap object.

If there was a ref struct StringBuilder that would only keep references to ReadOnlySpans and in the final ToString just allocate one string object?

like image 336
chrisn Avatar asked Jan 27 '23 04:01

chrisn


1 Answers

Edit: as Tseng notes in the comments, the newer string.Create method is the way to do this on platforms where it exists.


The scenario with multiple (but known) input spans is ideal for a "preallocate a dummy string, then pretend that strings are mutable and overwrite it before exposing it to the world" scenario. This looks gnarly, but this trick is very common in IO code when dealing with strings (especially from discontiguous buffers, etc), so it is well understood and supported.

Here we go (edit: now with added "hybrid" method, which avoids all the Slice() calls, without needing unsafe):

                        Method |     Mean |     Error |    StdDev |   Median |
------------------------------ |---------:|----------:|----------:|---------:|
          ConcatSpansBenchmark | 97.17 ns | 2.1335 ns | 4.0072 ns | 97.20 ns |
       OverwiteStringBenchmark | 63.34 ns | 1.2914 ns | 2.0854 ns | 62.29 ns |
      UnsafeOverwriteBenchmark | 17.95 ns | 0.3697 ns | 0.3796 ns | 17.80 ns |
 OverwiteStringHybridBenchmark | 53.59 ns | 0.5534 ns | 0.5176 ns | 53.49 ns |

Note: anything involving MemoryMarshal.*, Unsafe.* or the unsafe keyword is explicitly "I know what I'm doing... anything that exploded is probably my fault".

Code:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Text;

public class Program
{
    public string ConcatSpans(string longstring)
    {
        var span = longstring.AsSpan();
        var sb = new StringBuilder(longstring.Length);
        sb.Append(span.Slice(40, 10));
        sb.Append(span.Slice(30, 10));
        sb.Append(span.Slice(20, 10));
        sb.Append(span.Slice(10, 10));
        sb.Append(span.Slice(0, 10));
        return sb.ToString();
    }

    public string OverwiteString(string longstring)
    {
        var span = longstring.AsSpan();
        var s = new string('\0', longstring.Length);
        var writeable = MemoryMarshal.AsMemory(s.AsMemory()).Span;
        span.Slice(40, 10).CopyTo(writeable);
        writeable = writeable.Slice(10);
        span.Slice(30, 10).CopyTo(writeable);
        writeable = writeable.Slice(10);
        span.Slice(20, 10).CopyTo(writeable);
        writeable = writeable.Slice(10);
        span.Slice(10, 10).CopyTo(writeable);
        writeable = writeable.Slice(10);
        span.Slice(0, 10).CopyTo(writeable);
        return s;
    }

    public string OverwiteStringHybrid(string longstring)
    {
        var source = MemoryMarshal.AsBytes(MemoryMarshal.AsMemory(longstring.AsMemory()).Span);
        var s = new string('\0', longstring.Length);
        var target = MemoryMarshal.AsBytes(MemoryMarshal.AsMemory(s.AsMemory()).Span);

        Unsafe.CopyBlock(ref target[0], ref source[40 * sizeof(char)], 10 * sizeof(char));
        Unsafe.CopyBlock(ref target[10], ref source[30 * sizeof(char)], 10 * sizeof(char));
        Unsafe.CopyBlock(ref target[20], ref source[20 * sizeof(char)], 10 * sizeof(char));
        Unsafe.CopyBlock(ref target[30], ref source[10 * sizeof(char)], 10 * sizeof(char));
        Unsafe.CopyBlock(ref target[40], ref source[0], 10 * sizeof(char));

        return s;
    }

    public unsafe string UnsafeOverwrite(string longstring)
    {
        var s = new string('\0', longstring.Length);
        fixed (char* source = longstring)
        fixed (char* target = s)
        {
            Unsafe.CopyBlock(target, source + 40, 10 * sizeof(char));
            Unsafe.CopyBlock(target + 10, source + 30, 10 * sizeof(char));
            Unsafe.CopyBlock(target + 20, source + 20, 10 * sizeof(char));
            Unsafe.CopyBlock(target + 30, source + 10, 10 * sizeof(char));
            Unsafe.CopyBlock(target + 40, source, 10 * sizeof(char));
        }
        return s;
    }

    [Benchmark]
    public void ConcatSpansBenchmark()
        => ConcatSpans("aaaaaaaaaabbbbbbbbbbccccccccccddddddddddeeeeeeeeee");
    [Benchmark]
    public void OverwiteStringBenchmark()
    => OverwiteString("aaaaaaaaaabbbbbbbbbbccccccccccddddddddddeeeeeeeeee");
    [Benchmark]
    public void UnsafeOverwriteBenchmark()
    => UnsafeOverwrite("aaaaaaaaaabbbbbbbbbbccccccccccddddddddddeeeeeeeeee");

    [Benchmark]
    public void OverwiteStringHybridBenchmark()
    => OverwiteStringHybrid("aaaaaaaaaabbbbbbbbbbccccccccccddddddddddeeeeeeeeee");

    public static void Main(string[] args)
        => BenchmarkRunner.Run<Program>();
}

Note: in the general case - to get unsafe code from a slice:

with C# 7.3:

fixed(char* p = theSpan)
{
    ...
}

otherwise:

fixed(char* p = &MemoryMarshal.GetReference(theSpan))
{

}
like image 65
Marc Gravell Avatar answered Jan 29 '23 17:01

Marc Gravell