Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Span<char> and string equality

Tags:

c#

When Span<T> was announced, I wanted to use it in a parser for my toy programming language. (Actually, I'd probably store a Memory<char>, but that's beside the point.)

However, I have grown used to switching on strings:

switch (myString) {
    case "function":
        return TokenType.Function;
    // etc.
}

Switching on a Span<char> won't work, and allocating a String to compare against kind of defeats the purpose of using a Span.

Switching to using if-else statements would result in the same problem.

So, is there a way to efficiently to this? Does ToString() on a Span<char> not allocate?

like image 303
Plasticcaz Avatar asked Mar 15 '18 00:03

Plasticcaz


People also ask

What is ReadOnlySpan in C#?

A ReadOnlySpan<T> instance is often used to reference the elements of an array or a portion of an array. Unlike an array, however, a ReadOnlySpan<T> instance can point to managed memory, native memory, or memory managed on the stack.

How does C# span work?

Span<T> is a new value type at the heart of . NET. It enables the representation of contiguous regions of arbitrary memory, regardless of whether that memory is associated with a managed object, is provided by native code via interop, or is on the stack.

What is StringComparison ordinal?

The StringComparison enumeration is used to specify whether a string comparison should use the current culture or the invariant culture, word or ordinal sort rules, and be case-sensitive or case-insensitive. Important. When you call a string comparison method such as String. Compare, String.


2 Answers

System.MemoryExtensions contains methods that compare contents of Spans.

Working with .NET Core that supports implicit conversions between String and ReadOnlySpan<char>, you would have:

ReadOnlySpan<char> myString = "function";

if (MemoryExtensions.Equals(myString, "function", StringComparison.Ordinal))
{
    return TokenType.Function;
}
else if (MemoryExtensions.Equals(myString, "...", StringComparison.Ordinal))
{
    ... 
}

I'm calling the MemoryExtensions.Equals explicitly here because that way it is happy with the implicit conversion of the string literal (e.g. "function") to a ReadOnlySpan<char> for comparison purposes. If you were to call this extension method in an object-oriented way, you would need to explicitly use AsSpan:

if (myString.Equals("function".AsSpan(), StringComparison.Ordinal))

If you are particularly attached to the switch statement, you could abuse the pattern matching feature to smuggle the comparisons in, but that would not look very readable or even helpful:

ReadOnlySpan<char> myString = "function";

switch (myString)
{
    case ReadOnlySpan<char> s when MemoryExtensions.Equals(s, "function", StringComparison.Ordinal):
        return TokenType.Function;
        break;
    case ReadOnlySpan<char> s when MemoryExtensions.Equals(s, "...", StringComparison.Ordinal):
        ...
        break;
}

If you are not using .Net Core and had to install the System.Memory NuGet package separately, you would need to append .AsSpan() to each of the string literals.

like image 109
GSerg Avatar answered Oct 11 '22 15:10

GSerg


Calling ToString() would cause an allocation because strings are immutable but something you could consider is using the various MemoryExtensions Class methods to perform the comparison. So you could leave your source code being parsed in a Span<char> and use code such as the following:

System.ReadOnlySpan<char> myString = "function test();".AsSpan();
if (myString.StartsWith("function".AsSpan()))
    Console.WriteLine("function");

That will cause an intermediate string allocation for each token (the myString allocation was just to demonstrate) but you could initialize the token table as a once-off operation outside the token parser method. Also you might want to take a look into the Slice method as an efficient way to move through the code as you're parsing it.

And thanks to GSerg for pointing out among other things that .NET Core can handle the implicit conversion from string to ReadOnlySpan<char> so you can ommit the AsSpan() if using .NET Core.

like image 33
PeterJ Avatar answered Oct 11 '22 16:10

PeterJ