Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Currency Symbol placement in .NET strings

I'm developing some code to present currency symbols as part of a label on my application, and I have a reference list of currency symbols in Unicode hex format. In my code I am formatting the currency as follows:

(currency symbol) (decimal string) (currency description)

This approach works fine for most of the symbols, however I notice that some of the symbols are automatically shifted to the right of the decimal value even when explicitly placed to the left. Using the debugger, I see this behavior even at the most fundamental level in the strings themselves, so this is not a case of any manipulation at higher levels by the rendering in the presentation layer. The following code presents the simple case demonstrating the problem:

string rialSymbol = "\ufdfc";
string amount = "123.45";
string description = "Rials";
string plainConcat = rialSymbol + " " + amount + " " + description;
Debug.WriteLine(plainConcat);

The debug output (which matches also what's seen in the application UI) is as follows:

123.45 (rial symbol) Rials

(Note: symbol is to the right of the decimal not the left, as specified)

I have tried many approaches and varieties of string formatting, culture formatting etc., but nothing seems to address this issue. How can I enforce the placement of the unicode character without having the framework decide upon the symbol placement relative to the decimal value? This works with most other characters, why does the Rial (and a few others) cause this type of fundamental string behavior?

like image 656
DMG Avatar asked Nov 24 '10 22:11

DMG


1 Answers

U+FDFC is a right-to-left Unicode character. It's meant to be embedded in right-to-left text. You're mixing left-to-right and right-to-left text.

From Wikipedia:

In Unicode encoding, all non-punctuation characters are stored in writing order. This means that the writing direction of characters is stored within the characters. If this is the case, the character is called "strong". Punctuation characters however, can appear in both LTR and RTL scripts. They are called "weak" characters because they do not contain any directional information. So it is up to the software to decide in which direction these "weak" characters will be placed. Sometimes (in mixed-directions text) this leads to display errors, caused by the bidi-algorithm that runs through the text and identifies LTR and RTL strong characters and assigns a direction to weak characters, according to the algorithm's rules.

In the algorithm, each sequence of concatenated strong characters is called a "run". A weak character that is located between two strong characters with the same orientation will inherit their orientation. A weak character that is located between two strong characters with a different writing direction, will inherit the main context's writing direction (in an LTR document the character will become LTR, in an RTL document, it will become RTL). If a "weak" character is followed by another "weak" character, the algorithm will look at the first neighbouring "strong" character. Sometimes this leads to unintentional display errors. These errors are corrected or prevented with "pseudo-strong" characters. Such Unicode control characters are called marks. The mark U+200E (left-to-right mark) or U+200F (right-to-left mark) is to be inserted into a location to make an enclosed weak character inherit its writing direction.

For example, to correctly display the U+2122 ™​ trade mark sign for an English name brand (LTR) in an Arabic (RTL) passage, an LRM mark is inserted after the trademark symbol if the symbol is not followed by LTR text. If the LRM mark is not added, the weak character ™ will be neighbored by a strong LTR character and a strong RTL character. Hence, in an RTL context, it will be considered to be RTL, and displayed in an incorrect order.

So the solution is to add a U+200E left-to-right mark after right-to-left currency symbols:

string rialSymbol = "\ufdfc\u200e";
string amount = "123.45";
string description = "Rials";
string plainConcat = rialSymbol + " " + amount + " " + description;
Debug.WriteLine(plainConcat);
like image 92
dtb Avatar answered Sep 30 '22 13:09

dtb