Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

convert encoding via iconv linux

I used to convert encoding via iconv but today i stopped by something new to me
I made a testcase to make my question clear :

the goal is convert الحلقة الثالثة to its utf8 version : الحلقة الثالثة

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<title> this text is from arabic language   </title>
</head>
<body>
<p><span> &#1575;&#1604;&#1581;&#1604;&#1602;&#1577; &#1575;&#1604;&#1579;&#1575;&#1604;&#1579;&#1577;</span></p>
</body>
</html>

tried to use encoding like ASCII , LATIN1 , windows-1252 but with no luck how do i tell what is this type of encoding in order to convert it ?? both of google translate + stackoverflow editor was able to detect it and covert it ?

another example : this website http://kanjidict.stc.cx/recode.php was able to convert the encoding correctly if i check the Assume HTML (default: handle as plain text)

what i am missing and those 3 websites was do it to convert it correctly ????

like image 602
tawfekov Avatar asked Dec 29 '22 03:12

tawfekov


1 Answers

Well ,

after one day working , i have found my lost command , its a package i had installed called ascii2uni

simply by : sudo apt-get install ascii2uni

and after some testing i was able to convert one file to unicode by using this command

ascii2uni -a D source.html > target.html

and i was able to convert it using command line only

cheers

like image 133
tawfekov Avatar answered Jan 08 '23 22:01

tawfekov