While writing a class for parsing certain text, I needed the ability to get the line number of a specific character position (in other words, count all linebreaks that occur before that character).
Trying to find the most efficient code possible to achieve this I set up a couple of benchmarks, which revealed that Regex was the slowest method and that manually iterating the string was the fastest.
Following is my current approach (10k iterations: 278 ms):
private string text;
/// <summary>
/// Returns whether the specified character index is the end of a line.
/// </summary>
/// <param name="index">The index to check.</param>
/// <returns></returns>
private bool IsEndOfLine(int index)
{
//Matches "\r" and "\n" (but not "\n" if it's preceded by "\r").
char c = text[index];
return c == '\r' || (c == '\n' && (index == 0 || text[index - 1] != '\r'));
}
/// <summary>
/// Returns the number of the line at the specified character index.
/// </summary>
/// <param name="index">The index of the character which's line number to get.</param>
/// <returns></returns>
public int GetLineNumber(int index)
{
if(index < 0 || index > text.Length) { throw new ArgumentOutOfRangeException("index"); }
int lineNumber = 1;
int end = index;
index = 0;
while(index < end) {
if(IsEndOfLine(index)) lineNumber++;
index++;
}
return lineNumber;
}
However, while doing these benchmarks I remembered that method calls can sometimes be a bit expensive, so I decided to try moving the conditions from IsEndOfLine()
directly into the if
-statement inside GetLineNumber()
as well.
Like I expected, this executes more than two times faster (10k iterations: 112 ms):
while(index < end) {
char c = text[index];
if(c == '\r' || (c == '\n' && (index == 0 || text[index - 1] != '\r'))) lineNumber++;
index++;
}
From what I've read the JIT compiler doesn't (or at least didn't) optimize IL code that is more than 32 bytes in size[1] unless [MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
is specified[2]. But despite applying this attribute to IsEndOfLine()
, no inlining seems to occur.
Most of the talk I've been able to find about this are from older posts/articles. In the newest one ([2] from 2012) the author did apparently successfully inline a 34-byte function using MethodImplOptions.AggressiveInlining
, implying that the flag allows larger IL code to be inlined if all other criteria are met.
Measuring my method's size using the following code revealed that it is 54 bytes long:
Console.WriteLine(this.GetType().GetMethod("IsEndOfLine").GetMethodBody().GetILAsByteArray().Length);
Using the Dissasembly window in VS 2019 shows the following Assembly code for IsEndOfLine()
(with C# source code turned on in the Viewing Options):
(Configuration: Release (x86), disabled Just My Code and Suppress JIT optimization on module load)
--- [PATH REMOVED]\Performance Test - Find text line number\TextParser.cs
28: char c = text[index];
001E19BA in al,dx
001E19BB mov eax,dword ptr [ecx+4]
001E19BE cmp edx,dword ptr [eax+4]
001E19C1 jae 001E19FF
001E19C3 movzx eax,word ptr [eax+edx*2+8]
29: return c == '\r' || (c == '\n' && (index == 0 || text[index - 1] != '\r'));
001E19C8 cmp eax,0Dh
001E19CB je 001E19F8
001E19CD cmp eax,0Ah
001E19D0 jne 001E19F4
001E19D2 test edx,edx
001E19D4 je 001E19ED
001E19D6 dec edx
001E19D7 mov eax,dword ptr [ecx+4]
001E19DA cmp edx,dword ptr [eax+4]
001E19DD jae 001E19FF
001E19DF cmp word ptr [eax+edx*2+8],0Dh
001E19E5 setne al
001E19E8 movzx eax,al
001E19EB pop ebp
001E19EC ret
001E19ED mov eax,1
001E19F2 pop ebp
001E19F3 ret
001E19F4 xor eax,eax
001E19F6 pop ebp
001E19F7 ret
001E19F8 mov eax,1
001E19FD pop ebp
001E19FE ret
001E19FF call 70C2E2B0
001E1A04 int 3
...and the following code for the loop in GetLineNumber()
:
63: index = 0;
001E1950 xor esi,esi
64: while(index < end) {
001E1952 test ebx,ebx
001E1954 jle 001E196C
001E1956 mov ecx,edi
001E1958 mov edx,esi
001E195A call dword ptr ds:[144E10h]
001E1960 test eax,eax
001E1962 je 001E1967
65: if(IsEndOfLine(index)) lineNumber++;
001E1964 inc dword ptr [ebp-10h]
66: index++;
001E1967 inc esi
64: while(index < end) {
001E1968 cmp esi,ebx
001E196A jl 001E1956
67: }
68:
69: return lineNumber;
001E196C mov eax,dword ptr [ebp-10h]
001E196F pop ecx
001E1970 pop ebx
001E1971 pop esi
001E1972 pop edi
001E1973 pop ebp
001E1974 ret
I'm not very good at reading Assembly code, but it seems to me that no inlining has occurred.
Why doesn't the JIT compiler inline my IsEndOfLine()
method even when MethodImplOptions.AggressiveInlining
is specified? I know this flag is only a hint to the compiler, but based on [2] applying it should make it possible to inline IL larger than 32 bytes. Apart from that, to me, my code seems to satisfy all other conditions.
Is there some other kind of limitation that I am missing?
Results:
Text length: 11645
Line: 201
Standard loop: 00:00:00.2779946 (10000 à 00:00:00.0000277)
Line: 201
Standard loop (inline): 00:00:00.1122908 (10000 à 00:00:00.0000112)
<benchmark code moved to answer for brevity>
1To Inline or not to Inline: That is the question
2Aggressive Inlining in the CLR 4.5 JIT
For some reason, after restarting VS, enabling and re-disabling the settings mentioned before as well as re-applying MethodImplOptions.AggressiveInlining
, the method does now appear to be inlined. However, it has added a couple of instructions that aren't there when you inline the if
-conditions manually.
JIT-optimized version:
66: while(index < end) {
001E194B test ebx,ebx
001E194D jle 001E1998
001E194F mov esi,dword ptr [ecx+4]
67: if(IsEndOfLine(index)) lineNumber++;
001E1952 cmp edx,esi
001E1954 jae 001E19CA
001E1956 movzx eax,word ptr [ecx+edx*2+8]
001E195B cmp eax,0Dh
001E195E je 001E1989
001E1960 cmp eax,0Ah
001E1963 jne 001E1985
001E1965 test edx,edx
001E1967 je 001E197E
001E1969 mov eax,edx
001E196B dec eax
001E196C cmp eax,esi
001E196E jae 001E19CA
001E1970 cmp word ptr [ecx+eax*2+8],0Dh
001E1976 setne al
001E1979 movzx eax,al
001E197C jmp 001E198E
001E197E mov eax,1
001E1983 jmp 001E198E
001E1985 xor eax,eax
001E1987 jmp 001E198E
001E1989 mov eax,1
001E198E test eax,eax
001E1990 je 001E1993
001E1992 inc edi
68: index++;
My optimized version:
87: while(index < end) {
001E1E9B test ebx,ebx
001E1E9D jle 001E1ECE
001E1E9F mov esi,dword ptr [ecx+4]
88: char c = text[index];
001E1EA2 cmp edx,esi
001E1EA4 jae 001E1F00
001E1EA6 movzx eax,word ptr [ecx+edx*2+8]
89: if(c == '\r' || (c == '\n' && (index == 0 || text[index - 1] != '\r'))) lineNumber++;
001E1EAB cmp eax,0Dh
001E1EAE je 001E1EC8
001E1EB0 cmp eax,0Ah
001E1EB3 jne 001E1EC9
001E1EB5 test edx,edx
001E1EB7 je 001E1EC8
001E1EB9 mov eax,edx
001E1EBB dec eax
001E1EBC cmp eax,esi
001E1EBE jae 001E1F00
001E1EC0 cmp word ptr [ecx+eax*2+8],0Dh
001E1EC6 je 001E1EC9
001E1EC8 inc edi
90: index++;
New instructions:
001E1976 setne al
001E1979 movzx eax,al
001E197C jmp 001E198E
001E197E mov eax,1
001E1983 jmp 001E198E
001E1985 xor eax,eax
001E1987 jmp 001E198E
001E1989 mov eax,1
001E198E test eax,eax
I still see no improvement in performance/execution speed, however... Supposedly this is due to the extra instructions that the JIT added, and I'm guessing this is as good as it gets without inlining the conditions myself?
For some reason, after restarting VS, enabling and re-disabling the settings mentioned before as well as re-applying MethodImplOptions.AggressiveInlining
, the method does now appear to be inlined (weird that it didn't before...). However it has added a couple of instructions that aren't there when you inline the if
-conditions manually.
The efficiency/execution speed appears to stay the same, however. Hans Passant suggested that I replace the short-circuiting operators (where possible) with the regular |
and &
, which does reduce the speed gap from 2x to 1.5x. I'm guessing this is as good as it gets when it comes to JIT optimization.
return c == '\r' | (c == '\n' & (index == 0 || text[index - 1] != '\r'));
An interesting discovery I made (or, at least, interesting to me as I don't really understand how these Assembly-level optimizations work under the hood) was that when the same operator swap is done to the manually inlined conditions (inside GetLineNumberInline()
), execution speed gets much worse.
The purpose of this adventure was to get as efficient code as possible without having to duplicate it everywhere I use it (since in the original code IsEndOfLine()
is used multiple times throughout the project). In the end I think I'll stick to duplicating the IsEndOfLine()
code only inside GetLineNumber()
, as that proved to be the fastest in terms of execution speed.
I would like to thank the ones who took their time trying to help me (some comments have been removed), as while I didn't achieve the result that I thought JIT-optimized inlining would get me, I've still learned a lot that I didn't know before. Now I've at least gotten a small glimpse of what JIT optimization does under the hood and how it is much more complicated than what I could originally have imagined.
Complete benchmark results, for future reference (ordered by execution time):
Text length: 15882 Character position: 11912 Standard loop (inline): 00:00:00.1429526 (10000 à 0.0142 ms) Standard loop (inline unsafe): 00:00:00.1642801 (10000 à 0.0164 ms) Standard loop (inline + no short-circuit): 00:00:00.3250843 (10000 à 0.0325 ms) Standard loop (AggressiveInlining): 00:00:00.3318966 (10000 à 0.0331 ms) Standard loop (unsafe): 00:00:00.3605394 (10000 à 0.0360 ms) Standard loop: 00:00:00.3859629 (10000 à 0.0385 ms) Regex (Substring): 00:00:01.8794045 (10000 à 0.1879 ms) Regex (MatchCollection loop): 00:00:02.4916785 (10000 à 0.2491 ms) Resulting line: 284 /* "unsafe" is using pointers to access the string's characters */
class Program
{
const int RUNS = 10000;
static void Main(string[] args)
{
string text = "";
Random r = new Random();
//Some words to fill the string with.
string[] words = new string[] { "Hello", "world", "Inventory.MaxAmount 32", "+QUICKTORETALIATE", "TNT1 AABBCC 6 A_JumpIf(ACS_ExecuteWithResult(460, 0, 0, 0) == 0, \"See0\")" };
//Various line endings.
string[] endings = new string[] { "\r\n", "\r", "\n" };
/*
Generate text
*/
int lineCount = r.Next(256, 513);
for(int l = 0; l < lineCount; l++) {
int wordCount = r.Next(1, 4);
text += new string(' ', r.Next(4, 9));
for(int w = 0; w < wordCount; w++) {
text += words[wordCount] + (w < wordCount - 1 ? " " : "");
}
text += endings[r.Next(0, endings.Length)];
}
Console.WriteLine("Text length: " + text.Length);
Console.WriteLine();
/*
Initialize class and stopwatch
*/
TextParser parser = new TextParser(text);
Stopwatch sw = new Stopwatch();
List<int> numbers = new List<int>(); //Using a list to prevent the compiler from optimizing-away the "GetLineNumber" call.
/*
Test 1 - Standard loop
*/
sw.Restart();
for(int x = 0; x < RUNS; x++) {
numbers.Add(parser.GetLineNumber((int)(text.Length * 0.75) + r.Next(-4, 4)));
}
sw.Stop();
Console.WriteLine("Line: " + numbers[0]);
Console.WriteLine("Standard loop: ".PadRight(41) + sw.Elapsed.ToString() + " (" + numbers.Count + " à " + new TimeSpan(sw.Elapsed.Ticks / numbers.Count).TotalMilliseconds.ToString() + " ms)");
Console.WriteLine();
numbers = new List<int>();
/*
Test 2 - Standard loop (with AggressiveInlining)
*/
sw.Restart();
for(int x = 0; x < RUNS; x++) {
numbers.Add(parser.GetLineNumber2((int)(text.Length * 0.75) + r.Next(-4, 4)));
}
sw.Stop();
Console.WriteLine("Line: " + numbers[0]);
Console.WriteLine("Standard loop (AggressiveInlining): ".PadRight(41) + sw.Elapsed.ToString() + " (" + numbers.Count + " à " + new TimeSpan(sw.Elapsed.Ticks / numbers.Count).TotalMilliseconds.ToString() + " ms)");
Console.WriteLine();
numbers = new List<int>();
/*
Test 3 - Standard loop (with inline check)
*/
sw.Restart();
for(int x = 0; x < RUNS; x++) {
numbers.Add(parser.GetLineNumberInline((int)(text.Length * 0.75) + r.Next(-4, 4)));
}
sw.Stop();
Console.WriteLine("Line: " + numbers[0]);
Console.WriteLine("Standard loop (inline): ".PadRight(41) + sw.Elapsed.ToString() + " (" + numbers.Count + " à " + new TimeSpan(sw.Elapsed.Ticks / numbers.Count).TotalMilliseconds.ToString() + " ms)");
Console.WriteLine();
numbers = new List<int>();
/*
Test 4 - Standard loop (with inline and no short-circuiting)
*/
sw.Restart();
for(int x = 0; x < RUNS; x++) {
numbers.Add(parser.GetLineNumberInline2((int)(text.Length * 0.75) + r.Next(-4, 4)));
}
sw.Stop();
Console.WriteLine("Line: " + numbers[0]);
Console.WriteLine("Standard loop (inline + no short-circuit): ".PadRight(41) + sw.Elapsed.ToString() + " (" + numbers.Count + " à " + new TimeSpan(sw.Elapsed.Ticks / numbers.Count).TotalMilliseconds.ToString() + " ms)");
Console.WriteLine();
numbers = new List<int>();
/*
Test 5 - Standard loop (with unsafe check)
*/
sw.Restart();
for(int x = 0; x < RUNS; x++) {
numbers.Add(parser.GetLineNumberUnsafe((int)(text.Length * 0.75) + r.Next(-4, 4)));
}
sw.Stop();
Console.WriteLine("Line: " + numbers[0]);
Console.WriteLine("Standard loop (unsafe): ".PadRight(41) + sw.Elapsed.ToString() + " (" + numbers.Count + " à " + new TimeSpan(sw.Elapsed.Ticks / numbers.Count).TotalMilliseconds.ToString() + " ms)");
Console.WriteLine();
numbers = new List<int>();
/*
Test 6 - Standard loop (with inline + unsafe check)
*/
sw.Restart();
for(int x = 0; x < RUNS; x++) {
numbers.Add(parser.GetLineNumberUnsafeInline((int)(text.Length * 0.75) + r.Next(-4, 4)));
}
sw.Stop();
Console.WriteLine("Line: " + numbers[0]);
Console.WriteLine("Standard loop (inline unsafe): ".PadRight(41) + sw.Elapsed.ToString() + " (" + numbers.Count + " à " + new TimeSpan(sw.Elapsed.Ticks / numbers.Count).TotalMilliseconds.ToString() + " ms)");
Console.WriteLine();
numbers = new List<int>();
/*
Test 7 - Regex (with Substring)
*/
sw.Restart();
for(int x = 0; x < RUNS; x++) {
numbers.Add(parser.GetLineNumberRegex((int)(text.Length * 0.75) + r.Next(-4, 4)));
}
sw.Stop();
Console.WriteLine("Line: " + numbers[0]);
Console.WriteLine("Regex (Substring): ".PadRight(41) + sw.Elapsed.ToString() + " (" + numbers.Count + " à " + new TimeSpan(sw.Elapsed.Ticks / numbers.Count).TotalMilliseconds.ToString() + " ms)");
Console.WriteLine();
numbers = new List<int>();
/*
Test 8 - Regex (with MatchCollection loop)
*/
sw.Restart();
for(int x = 0; x < RUNS; x++) {
numbers.Add(parser.GetLineNumberRegex2((int)(text.Length * 0.75) + r.Next(-4, 4)));
}
sw.Stop();
Console.WriteLine("Line: " + numbers[0]);
Console.WriteLine("Regex (MatchCollection loop): ".PadRight(41) + sw.Elapsed.ToString() + " (" + numbers.Count + " à " + new TimeSpan(sw.Elapsed.Ticks / numbers.Count).TotalMilliseconds.ToString() + " ms)");
Console.WriteLine();
numbers = new List<int>();
/*
Tests completed
*/
Console.Write("All tests completed. Press ENTER to close...");
while(Console.ReadKey(true).Key != ConsoleKey.Enter);
}
}
public class TextParser
{
private static readonly Regex LineRegex = new Regex("\r\n|\r|\n", RegexOptions.Compiled);
private string text;
public TextParser(string text)
{
this.text = text;
}
/// <summary>
/// Returns whether the specified character index is the end of a line.
/// </summary>
/// <param name="index">The index to check.</param>
/// <returns></returns>
private bool IsEndOfLine(int index)
{
char c = text[index];
return c == '\r' || (c == '\n' && (index == 0 || text[index - 1] != '\r'));
}
/// <summary>
/// Returns whether the specified character index is the end of a line.
/// </summary>
/// <param name="index">The index to check.</param>
/// <returns></returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private bool IsEndOfLineAggressiveInlining(int index)
{
char c = text[index];
return c == '\r' || (c == '\n' && (index == 0 || text[index - 1] != '\r'));
}
/// <summary>
/// Returns whether the specified character index is the end of a line.
/// </summary>
/// <param name="index">The index to check.</param>
/// <returns></returns>
private bool IsEndOfLineUnsafe(int index)
{
unsafe
{
fixed(char* ptr = text) {
char c = ptr[index];
return c == '\r' || (c == '\n' && (index == 0 || ptr[index - 1] != '\r'));
}
}
}
/// <summary>
/// Returns the number of the line at the specified character index.
/// </summary>
/// <param name="index">The index of the character which's line number to get.</param>
/// <returns></returns>
public int GetLineNumber(int index)
{
if(index < 0 || index > text.Length) { throw new ArgumentOutOfRangeException("index"); }
int lineNumber = 1;
int end = index;
index = 0;
while(index < end) {
if(IsEndOfLine(index)) lineNumber++;
index++;
}
return lineNumber;
}
/// <summary>
/// Returns the number of the line at the specified character index.
/// </summary>
/// <param name="index">The index of the character which's line number to get.</param>
/// <returns></returns>
public int GetLineNumber2(int index)
{
if(index < 0 || index > text.Length) { throw new ArgumentOutOfRangeException("index"); }
int lineNumber = 1;
int end = index;
index = 0;
while(index < end) {
if(IsEndOfLineAggressiveInlining(index)) lineNumber++;
index++;
}
return lineNumber;
}
/// <summary>
/// Returns the number of the line at the specified character index.
/// </summary>
/// <param name="index">The index of the character which's line number to get.</param>
/// <returns></returns>
public int GetLineNumberInline(int index)
{
if(index < 0 || index > text.Length) { throw new ArgumentOutOfRangeException("index"); }
int lineNumber = 1;
int end = index;
index = 0;
while(index < end) {
char c = text[index];
if(c == '\r' || (c == '\n' && (index == 0 || text[index - 1] != '\r'))) lineNumber++;
index++;
}
return lineNumber;
}
/// <summary>
/// Returns the number of the line at the specified character index.
/// </summary>
/// <param name="index">The index of the character which's line number to get.</param>
/// <returns></returns>
public int GetLineNumberInline2(int index)
{
if(index < 0 || index > text.Length) { throw new ArgumentOutOfRangeException("index"); }
int lineNumber = 1;
int end = index;
index = 0;
while(index < end) {
char c = text[index];
if(c == '\r' | (c == '\n' & (index == 0 || text[index - 1] != '\r'))) lineNumber++;
index++;
}
return lineNumber;
}
/// <summary>
/// Returns the number of the line at the specified character index.
/// </summary>
/// <param name="index">The index of the character which's line number to get.</param>
/// <returns></returns>
public int GetLineNumberUnsafe(int index)
{
if(index < 0 || index > text.Length) { throw new ArgumentOutOfRangeException("index"); }
int lineNumber = 1;
int end = index;
index = 0;
while(index < end) {
if(IsEndOfLineUnsafe(index)) lineNumber++;
index++;
}
return lineNumber;
}
/// <summary>
/// Returns the number of the line at the specified character index.
/// </summary>
/// <param name="index">The index of the character which's line number to get.</param>
/// <returns></returns>
public int GetLineNumberUnsafeInline(int index)
{
if(index < 0 || index > text.Length) { throw new ArgumentOutOfRangeException("index"); }
int lineNumber = 1;
int end = index;
unsafe
{
fixed(char* ptr = text) {
index = 0;
while(index < end) {
char c = ptr[index];
if(c == '\r' || (c == '\n' && (index == 0 || ptr[index - 1] != '\r'))) lineNumber++;
index++;
}
}
}
return lineNumber;
}
/// <summary>
/// Returns the number of the line at the specified character index. Utilizes a Regex.
/// </summary>
/// <param name="index">The index of the character which's line number to get.</param>
/// <returns></returns>
public int GetLineNumberRegex(int index)
{
return LineRegex.Matches(text.Substring(0, index)).Count + 1;
}
/// <summary>
/// Returns the number of the line at the specified character index. Utilizes a Regex.
/// </summary>
/// <param name="index">The index of the character which's line number to get.</param>
/// <returns></returns>
public int GetLineNumberRegex2(int index)
{
int lineNumber = 1;
MatchCollection mc = LineRegex.Matches(text);
for(int y = 0; y < mc.Count; y++) {
if(mc[y].Index >= index) break;
lineNumber++;
}
return lineNumber;
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With