I used to convert encoding via iconv
but today i stopped by something new to me
I made a testcase to make my question clear :
the goal is convert الحلقة الثالثة
to its utf8 version : الحلقة الثالثة
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<title> this text is from arabic language </title>
</head>
<body>
<p><span> الحلقة الثالثة</span></p>
</body>
</html>
tried to use encoding like ASCII , LATIN1 , windows-1252
but with no luck
how do i tell what is this type of encoding in order to convert it ??
both of google translate + stackoverflow editor was able to detect it and covert it ?
another example : this website http://kanjidict.stc.cx/recode.php was able to convert the encoding correctly if i check the Assume HTML (default: handle as plain text)
what i am missing and those 3 websites was do it to convert it correctly ????
Well ,
after one day working , i have found my lost command , its a package i had installed called ascii2uni
simply by : sudo apt-get install ascii2uni
and after some testing i was able to convert one file to unicode by using this command
ascii2uni -a D source.html > target.html
and i was able to convert it using command line only
cheers
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With