Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

c# Fastest way to compare strings

Tags:

I've noticed that

string1.Length == string2.Length && string1 == string2 

on large strings is slightly faster than just

string1 == string2 

Is this true? And is this a good practice to compare large string lengths before comparing actual strings?

like image 901
CoolCodeBro Avatar asked Oct 17 '13 20:10

CoolCodeBro


People also ask

What C is used for?

C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...

What is C in C language?

What is C? C is a general-purpose programming language created by Dennis Ritchie at the Bell Laboratories in 1972. It is a very popular language, despite being old. C is strongly associated with UNIX, as it was developed to write the UNIX operating system.

Is C language easy?

Compared to other languages—like Java, PHP, or C#—C is a relatively simple language to learn for anyone just starting to learn computer programming because of its limited number of keywords.

What is the full name of C?

In the real sense it has no meaning or full form. It was developed by Dennis Ritchie and Ken Thompson at AT&T bell Lab. First, they used to call it as B language then later they made some improvement into it and renamed it as C and its superscript as C++ which was invented by Dr.


2 Answers

strings operator equals does the length check before comparing the chars. So you do not save the comparison of the contents with this trick. You might still save a few CPU cycles because your length check assumes that the strings are not null, while the BCL must check that. So if the lengths are not equal most of the time, you will short-circuit a few instructions.

I might just be wrong here, though. Maybe the operator gets inlined and the checks optimized out. Who knows for sure? (He who measures.)

If you care about saving every cycle you can you should probably use a different strategy in the first place. Maybe managed code is not even the right choice. Given that, I recommend to use the shorter form and not use the additional check.

like image 99
usr Avatar answered Sep 28 '22 00:09

usr


String.Equality Operator or == internally calls string.Equals, so use string.Equals or == provided by the framework. It is already optimized enough.

It first compare references, then length and then actual characters.

You can find the source code here

Code: (Source: http://www.dotnetframework.org/default.aspx/4@0/4@0/DEVDIV_TFS/Dev10/Releases/RTMRel/ndp/clr/src/BCL/System/String@cs/1305376/String@cs)

// Determines whether two strings match. [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)] public override bool Equals(Object obj) {     if (this == null)                        //this is necessary to guard against reverse-pinvokes and         throw new NullReferenceException();  //other callers who do not use the callvirt instruction      String str = obj as String;     if (str == null)         return false;      if (Object.ReferenceEquals(this, obj))         return true;      return EqualsHelper(this, str); } 

and

[System.Security.SecuritySafeCritical]  // auto-generated [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)] private unsafe static bool EqualsHelper(String strA, String strB) {     Contract.Requires(strA != null);     Contract.Requires(strB != null);     int length = strA.Length;     if (length != strB.Length) return false;      fixed (char* ap = &strA.m_firstChar) fixed (char* bp = &strB.m_firstChar)     {         char* a = ap;         char* b = bp;          // unroll the loop #if AMD64         // for AMD64 bit platform we unroll by 12 and         // check 3 qword at a time. This is less code         // than the 32 bit case and is shorter         // pathlength          while (length >= 12)         {             if (*(long*)a     != *(long*)b) break;             if (*(long*)(a+4) != *(long*)(b+4)) break;             if (*(long*)(a+8) != *(long*)(b+8)) break;             a += 12; b += 12; length -= 12;         }  #else         while (length >= 10)         {             if (*(int*)a != *(int*)b) break;             if (*(int*)(a+2) != *(int*)(b+2)) break;             if (*(int*)(a+4) != *(int*)(b+4)) break;             if (*(int*)(a+6) != *(int*)(b+6)) break;             if (*(int*)(a+8) != *(int*)(b+8)) break;             a += 10; b += 10; length -= 10;         }   #endif          // This depends on the fact that the String objects are         // always zero terminated and that the terminating zero is not included         // in the length. For odd string sizes, the last compare will include         // the zero terminator.         while (length > 0)         {             if (*(int*)a != *(int*)b) break;             a += 2; b += 2; length -= 2;         }          return (length <= 0);     } } 
like image 22
Habib Avatar answered Sep 28 '22 00:09

Habib