Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fix reversed arabic characters mixed with english in SQL server

I have a sql server database has table column contains arabic and english characters in the same field just like Oracle اوراكل

Seeking your help to split the arabic characters from english ones to reverse them. as the arabic characters don't have a specific positions in the field, start, end or in the middle of field.

Edit The characters come from a legacy IBM mainframe application, and are stored in the wrong order, i.e. they are stored in the order they should be displayed left to right, instead of the order they are to be read.

What is needed is to make them display correctly in other applications.

like image 596
user3429592 Avatar asked Oct 20 '22 12:10

user3429592


1 Answers

The problem is that you have a number of strings in the database which are, for legacy reasons, stored in non-lexical order. Probably they came from a character terminal based application which can only store characters in left-to-right order.

You can force compliant applications to display Arabic Left-to-Right by using the special Unicode character LRO U+202D: LEFT-TO-RIGHT OVERRIDE. This forces all characters to be rendered left to right regardless of how they normally would be rendered.

The effect ends at the end of the string or at character PDF U+202C POP DIRECTIONAL FORMATTING.

In your case all you need to do is put the LRO character at the beginning of every affected string:

select nchar(8237) + columnName as columnNameDisplay
from BadTable 

The number 8237 the decimal equivalent of hexadecimal 202D.

If you might be concatenating these strings with other strings which are stored correctly, you should also use the PDF character at the end:

select nchar(8237) + columnName + nchar(8236) as columnNameDisplay
from BadTable 

This tells the text rendering engine that the forced Left-To-Right sequence has come to an end.

For more information see here:

  • http://www.unicode.org/reports/tr9/#Explicit_Directional_Overrides

Notes:

  • The combining characters will not combine properly
  • Text-to-speech software won't work - it will probably read it alphabetically but I am not sure.

Further Information

Characters should be stored in the order they are written or read, not in the order they are displayed. So for example, the string:

test اختبار test

should be stored as

01  t
02  e
03  s
04  t
05   
07  ا
خ  08  
09  ت
10  ب  
11  ا
12  ر
13 
14  t
15  e
16  s
17  t

Notice that the leftmost Arabic character as displayed is stored at position 12 (substring(@var, 12, 1)), and the rightmost one as displayed is at position 7 (substring(@var, 7, 1)). If you simply count the positions characters as they are displayed from left to right, the Arabic portion appears reversed compared to how it is stored. But that is because that portion is supposed to be read from right to left, therefore it is displayed right to left.

To fix your problem you first need to check: Are the strings stored wrongly OR are they stored correctly but displayed wrongly?

like image 153
Ben Avatar answered Oct 27 '22 10:10

Ben