I want to add markup to (Urdu language) text that is written right to left. I am trying to use gsub for the purpose but everything I have tried so far does not produce the desired output
text <- "یہ جملہ ایک مثال کے لیے استعمال کیا جا رہا ہے"
pattern <- "کیا جا"
replaceWith <- paste0("<somemark>", pattern, "</somemark>")
gsub(pattern, replaceWith, text)
gsub returns the following
یہ جملہ ایک مثال کے لیے استعمال <somemark>کیا جا</somemark> رہا ہے
desired output .
How can I acheive the desired output?
Note: I could not even properly typeset the desired output in my post, I had to rely on an image instead.
Update: Although mysub
function below correctly concatenates the strings(in console), I continue to face the problem of incorrect order of text in shiny app.
mysub <- function(text, pattern){
beforePattern <- substr(text, 1, regexpr(pattern, text)[1]-1)
afterPattern <- substr(text, regexpr(pattern,text)[1] + nchar(pattern), nchar(text))
result <- paste(afterPattern, replaceWith, beforePattern)
result
}
There is actually no problem with gsub
:
text <- dput("یہ جملہ ایک مثال کے لیے استعمال کیا جا رہا ہے")
"<U+06CC><U+06C1> <U+062C><U+0645><U+0644><U+06C1> <U+0627><U+06CC><U+06A9>
<U+0645><U+062B><U+0627><U+0644> <U+06A9><U+06D2> <U+0644><U+06CC><U+06D2>
<U+0627><U+0633><U+062A><U+0639><U+0645><U+0627><U+0644> <U+06A9><U+06CC>
<U+0627> <U+062C><U+0627> <U+0631><U+06C1><U+0627> <U+06C1><U+06D2>"
pattern <- dput("کیا جا")
"<U+06A9><U+06CC><U+0627> <U+062C><U+0627>"
replaceWith <- dput(paste0("<somemark>", pattern, "</somemark>"))
"<somemark><U+06A9><U+06CC><U+0627> <U+062C><U+0627></somemark>"
dput(gsub(pattern, replaceWith, text))
"<U+06CC><U+06C1> <U+062C><U+0645><U+0644><U+06C1> <U+0627><U+06CC><U+06A9>
<U+0645><U+062B><U+0627><U+0644> <U+06A9><U+06D2> <U+0644><U+06CC><U+06D2>
<U+0627><U+0633><U+062A><U+0639><U+0645><U+0627><U+0644> <somemark><U+06A9>
<U+06CC><U+0627> <U+062C><U+0627></somemark> <U+0631><U+06C1><U+0627>
<U+06C1><U+06D2>"
The rendering of the result ( a string containing both right to left and left to right characters) is also quite logical to me:
یہ جملہ ایک مثال کے لیے استعمال
یہ جملہ ایک مثال کے لیے استعمال <somemark>
یہ جملہ ایک مثال کے لیے استعمال <somemark>کیا جا
یہ جملہ ایک مثال کے لیے استعمال <somemark>کیا جا</somemark>
یہ جملہ ایک مثال کے لیے استعمال <somemark>کیا جا</somemark> رہا ہے
Your idea of what should be rendered doesn't seem to me more logical, but I must admit I don't have experience with right to left text rendering.
Anyway, if the formatting has to be interpreted by the renderer like the <b>...</b>
tags in HTML, then it works perfectly (in markdown/html):
یہ جملہ ایک مثال کے لیے استعمال <b>کیا جا</b> رہا ہے
renders as
یہ جملہ ایک مثال کے لیے استعمال کیا جا رہا ہے
I have not managed to print nothing in shiny but question marks:
???? ???????? ?????? ???????? ???? ?????? ?????????????? <somemark>?????? ????</somemark> ?????? ????
I gave it a try . I did take the liberty of hard coding the args instead of reading from session, though.
Server:
output$mysub <- function(){ # (text=NULL, pattern=NULL)
text <- "یہ جملہ ایک مثال کے لیے استعمال کیا جا رہا ہے"
pattern <- "کیا جا"
Encoding(text) <- "UTF-8"
Encoding(pattern) <- "UTF-8"
print(text)
beforePattern <- substr(text, 1, regexpr(pattern, text)[1]-1)
afterPattern <- substr(text, regexpr(pattern,text)[1] + nchar(pattern), nchar(text))
replaceWith <- paste0("<somemark>", pattern, "</somemark>")
result <- paste(afterPattern, replaceWith, beforePattern)
# result <- paste( beforePattern, replaceWith, afterPattern)
# Encoding(result) <- "UTF-8"
print(length(result))
print(result)
return(result)
}
# ui.R:
h2( textOutput("mysub") )
The output I got on shiny webpage was :
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With