Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to explain sorting (numerical, lexicographical and collation) with examples to non technical testers?

I need to explain the differences using French and Spanish first and last names. Any pointers are appreciated. I did a Google search but the results are not satisfactory.

like image 966
Aravind Yarram Avatar asked Jul 25 '11 00:07

Aravind Yarram


1 Answers

Here are some explanations:

Lexicographical

In this case, you sort text without considering numbers. In fact, numbers are just "letters", they have no numeric combined meaning.

This means that the text "ABC123" is sorted as the letters A, B, C, 1, 2 and 3, not as A, B, C and then the number 123.

This has the unfortunate consequence that ordering things that might look like they should order like numbers doesn't.

For instance, when sorting these two:

ABC90
ABC100

You might expect the one with 90 to be sorted before 100 because 90 comes before 100, but that's not how lexicographical ordering works, it compares the 9 with the 1, and then swaps them around.

Natural Ordering

This is the ordering that would make the above ordering work properly, by sorting 90 before 100. Natural ordering switches to numeric ordering for a portion of the text, if it encounters numbers in both texts.

Collation-based ordering

This one handles things like variations between languages.

Normally, lexicographical ordering compares one letter to another letter, and determines their order, usually according to the "value" of the letter. This can have some strange effects.

For instance, how do you think the following two strings would be ordered?

ABCTEN
ABCßEN

Well, since the letter for ß might have an ordinal value (ie. its "place" in the Unicode alphabet) that has a higher value than the T, the above order is what would be the outcome. Basically, if you go look in the Unicode chart that contains all the letters, you might find that T has a symbol value of less than 100, and the ß be above 100.

However, in Germany, you should consider the above two texts as this:

ABCTEN
ABCSSEN

and thus their order should be reversed, since S comes before T.

This is collation-based ordering. You pick a collation for your text that describes the context in which those texts should be processed. This allows you to get natural ordering in different languages.

For instance, in Norway, the letters Æ, Ø and Å are ranked as coming directly after the Z, however in other languages (I forget which), Æ should be ranked just after E, Ø just after O and Å just after A. The collation dictates this.

like image 175
Lasse V. Karlsen Avatar answered Nov 13 '22 21:11

Lasse V. Karlsen