I'm trying to concatenate several strings containing both arabic and western characters (mixed in the same string). The problem is that the result is a String that is, most likely, semantically correct, but different from what I want to obtain, because the order of the characters is altered by the Unicode Bidirectional Algorithm. Basically, I just want to concatenate as if they were all LTR, ignoring the fact that some are RTL, a sort of "agnostic" concatenation.
I'm not sure if I was clear in my explanation, but I don't think I can do it any better.
Hope someone can help me.
Kind regards,
Carlos Ferreira
BTW, the strings are being obtained from the database.
EDIT
The first 2 Strings are the strings I want to concatenate and the third is the result.
EDIT 2
Actually, the concatenated String is a little different from the one in the image, it got altered during the copy+paste, the 1 is after the first A and not immediately before the second A.
C++ has a built-in method to concatenate strings. The strcat() method is used to concatenate strings in C++. The strcat() function takes char array as input and then concatenates the input values passed to the function. In the above example, we have declared two char arrays mainly str1 and str2 of size 100 characters.
In formal language theory and computer programming, string concatenation is the operation of joining character strings end-to-end. For example, the concatenation of "snow" and "ball" is "snowball".
You concatenate strings by using the + operator. For string literals and string constants, concatenation occurs at compile time; no run-time concatenation occurs. For string variables, concatenation occurs only at run time.
You can embed bidi regions using unicode format control codepoints:
So in java, to embed a RTL language like Arabic in an LTR language like English, you would do
myEnglishString + "\u202B" + myArabicString + "\u202C" + moreEnglish
and to do the reverse
myArabicString + "\u202A" + myEnglishString + "\u202C" + moreArabic
See Bidirectional General Formatting for more details, or the Unicode specification chapter on "Directional Formatting Codes" for the source material.
It's very likely that you need to insert Unicode directional formatting codes into your string to get your string display correctly. For details see Directional Formatting Codes of the Unicode Bidirectional Algorithm specification.
Maybe the Bidi class can help you in determining the correct sequence, as it implements the Unicode Bidirectional Algorithm.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With