I can do it in vim like so:
:%s/\%u2013/-/g
How do I do the equivalent in Perl? I thought this would do it but it doesn't seem to be working:
perl -i -pe 's/\x{2013}/-/g' my.dat
Save this answer. Show activity on this post. You CAN'T convert from Unicode to ASCII. Almost every character in Unicode cannot be expressed in ASCII, and those that can be expressed have exactly the same codepoints in ASCII as in UTF-8, which is probably what you have.
The ord() function is an inbuilt function in Perl that returns the ASCII value of the first character of a string. This function takes a character string as a parameter and returns the ASCII value of the first character of this string.
Unicode is the universal character encoding used to process, store and facilitate the interchange of text data in any language while ASCII is used for the representation of text such as symbols, letters, digits, etc.
While Perl does not implement the Unicode standard or the accompanying technical reports from cover to cover, Perl does support many Unicode features. Also, the use of Unicode may present security issues that aren't obvious, see "Security Implications of Unicode" below.
For a generic solution, Text::Unidecode transliterate pretty much anything that's thrown at it into pure US-ASCII.
So in your case this would work:
perl -C -MText::Unidecode -n -i -e'print unidecode( $_)' unicode_text.txt
The -C is there to make sure the input is read as utf8
It converts this:
l'été est arrivé à peine après aôut
¿España es un paìs muy lindo?
some special chars: » « ® ¼ ¶ – – — Ṉ
Some greek letters: β ÷ Θ ¬ the α and ω (or is it Ω?)
hiragana? みせる です
Здравствуйте
السلام عليكم
into this:
l'ete est arrive a peine apres aout
?Espana es un pais muy lindo?
some special chars: >> << (r) 1/4 P - - -- N
Some greek letters: b / Th ! the a and o (or is it O?)
hiragana? miseru desu
Zdravstvuitie
lslm `lykm
The last one shows the limits of the module, which can't infer the vowels and get as-salaamu `alaykum from the original arabic. It's still pretty good I think
This did the trick for me:
perl -C1 -i -pe 's/–/-/g' my.dat
Note that the first bar is the \x{2013} character itself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With