Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

integer and string comparison at system level

How are integers and strings compared at a low level? Like whenever we use

int a = 11;
int b = 12;

compare(a,b); //Just an example comparison, not in any particular language.

And

String a = "11";
String b = "12";

compare(a,b);

Now what I am asking is what is the system level difference between these two comparisons?Question is not for any particular language, it's just a generic issue. It is also not regarding the string to integer conversion/comparison or vice-versa. I know that the answers might be different for different platforms and different languages, but as I have no clue about this I am just asking a generic question.

And why are integer comparisons always considered faster then string comparisons?

like image 930
buch11 Avatar asked Jul 19 '11 01:07

buch11


2 Answers

Typically, the string or integer (in the simplest form) is compared byte by byte.

So for the int example, that becomes the single CPU instruction:

cmp a, b

Which runs rather fast (assuming 32-bit ints, 32-bit or better processor). It's a single comparison that fits in CPU registers.

Strings, however, are more complex. At their simplest, it looks like:

foreach ( character c in string a, character d in string b )
    cmp c, d

and has to loop over the entire string, character by character. If the strings are of different lengths, it has to handle that (ints are both of the same size, obviously).

At a more complex level, with locale and various character sets, each string character may be 2-4 bytes and some characters (with accents and such) may compare as equal to each other despite having different byte values. Far more handling and processing is involved, and more work almost always means slower.

The exact behavior varies by locale, character set, and language. Some languages (C#, for example) store strings with a length, while others (C) simply store an array of characters. Other languages may be designed for string processing or have optimized libraries to handle it, which can decrease the cost.

Interestingly, in theory, when working with ASCII strings, comparing strings of 3 characters or less could be roughly as fast as comparing ints. In that case, it has more to do with the amount of memory involved (strcmp for ASCII can use memcmp internally, which is approximately what == would use anyway). This may also hold true for languages which store the string length at the beginning and 0-length (empty) strings, as they can simply compare the length (which may be an int).

like image 139
ssube Avatar answered Nov 10 '22 07:11

ssube


the integers are stored as integer values represented in binary as a single set of 1's and zeroes, taking up a couple bytes (depending on the OS)

the strings are stored as one character per digit, each one using a bit pattern in its byte.

so in your example, the strings are taking up approximately twice the amount of bytes to represent as compared to the ints.

like image 24
Randy Avatar answered Nov 10 '22 07:11

Randy