I am trying to remove the last character of a string in a "right-to-left" language. When I do, however, the last character wraps to the beginning of the string.
e.g.
ותֵיהֶם]׃
becomes
ותֵיהֶם]
I know that this is a fundamental issue with how I'm handling the R-T-L paradigm, but if someone could help me think through it, I'd very much appreciate it.
CODE
with open(r"file.txt","r") as f:
for line in f:
line = unicode(line,'utf-8')
the_text = line.split('\t')[1]
the_text.replace(u'\u05C3','')
You can remove a character from a Python string using replace() or translate(). Both these methods replace a character or string with a given value.
Using rstrip() to remove the last character The rstrip() is a built-in Python function that returns a String copy with trailing characters removed. For example, we can use the rstrip() function with negative indexing to remove the final character of the string.
Python String rstrip() Method The rstrip() method removes any trailing characters (characters at the end a string), space is the default trailing character to remove.
Use the JavaScript replace() method with RegEx to remove a specific character from the string. The example code snippet helps to remove comma ( , ) characters from the start and end of the string using JavaScript. var myString = ',codex,world,'; myString = myString. replace(/^,+|,+$/g, '');
Some characters in Unicode are always LTR, some are always RTL, and some can be either depending on their surrounding context. In addition, the display context for bidirectional text will have a "predominant" directionality (e.g. a text editor configured for mainly-English text would be predominantly LTR and have a ragged right margin, one configured for mainly-Hebrew would be predominantly RTL with a ragged left margin).
It looks like what has happened here is that when a closing square bracket character appears between two RTL characters it is rendered in its RTL form (your first example) but when it appears between a RTL and a LTR character (or at the end of the string - basically, somewhere where it doesn't have other characters of the same directionality on both sides) then it is considered to be part of whichever run of text matches the predominant direction. If you try dragging your mouse over the string to select the characters you'll see that logically the closing ]
still follows the ֶם
even if visually it appears to have moved.
If the second-to-last character in your string were also a Hebrew character (or other strongly RTL character) rather than a ]
, or if the display context was predominantly RTL, then it would appear where you expect it to.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With