Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String concatenation containing Arabic and Western characters

I'm trying to concatenate several strings containing both arabic and western characters (mixed in the same string). The problem is that the result is a String that is, most likely, semantically correct, but different from what I want to obtain, because the order of the characters is altered by the Unicode Bidirectional Algorithm. Basically, I just want to concatenate as if they were all LTR, ignoring the fact that some are RTL, a sort of "agnostic" concatenation.

I'm not sure if I was clear in my explanation, but I don't think I can do it any better.

Hope someone can help me.

Kind regards,

Carlos Ferreira

BTW, the strings are being obtained from the database.

EDIT

enter image description here

The first 2 Strings are the strings I want to concatenate and the third is the result.

EDIT 2

Actually, the concatenated String is a little different from the one in the image, it got altered during the copy+paste, the 1 is after the first A and not immediately before the second A.

like image 201
Carlos Ferreira Avatar asked May 30 '11 14:05

Carlos Ferreira


People also ask

How do I concatenate two characters to a string?

C++ has a built-in method to concatenate strings. The strcat() method is used to concatenate strings in C++. The strcat() function takes char array as input and then concatenates the input values passed to the function. In the above example, we have declared two char arrays mainly str1 and str2 of size 100 characters.

Which is an example of string concatenation?

In formal language theory and computer programming, string concatenation is the operation of joining character strings end-to-end. For example, the concatenation of "snow" and "ball" is "snowball".

Which is the correct way to concatenate 2 strings?

You concatenate strings by using the + operator. For string literals and string constants, concatenation occurs at compile time; no run-time concatenation occurs. For string variables, concatenation occurs only at run time.


2 Answers

You can embed bidi regions using unicode format control codepoints:

  • Left-to-right embedding (U+202A)
  • Right-to-left embedding (U+202B)
  • Pop directional formatting (U+202C)

So in java, to embed a RTL language like Arabic in an LTR language like English, you would do

myEnglishString + "\u202B" + myArabicString + "\u202C" + moreEnglish 

and to do the reverse

myArabicString + "\u202A" + myEnglishString + "\u202C" + moreArabic 

See Bidirectional General Formatting for more details, or the Unicode specification chapter on "Directional Formatting Codes" for the source material.

like image 157
Mike Samuel Avatar answered Oct 02 '22 22:10

Mike Samuel


It's very likely that you need to insert Unicode directional formatting codes into your string to get your string display correctly. For details see Directional Formatting Codes of the Unicode Bidirectional Algorithm specification.

Maybe the Bidi class can help you in determining the correct sequence, as it implements the Unicode Bidirectional Algorithm.

like image 33
MicSim Avatar answered Oct 02 '22 23:10

MicSim