Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fix PDF encoding [closed]

I have Arabic PDF Files and it seems that there are something wrong in its encoding .

When I try to search in the PDF for word inside it , it didn't find results

when I try to export the pdf contents to Excel using other programs it export data in a strange encoding

When I copy the data in the PDF to notepad , Notepad display strange encoding.

I am developing solution which will use these PDFs (about 950 file) so I must found a way to fix encoding.

Thanks in Advance

like image 668
M_1100 Avatar asked Nov 13 '22 14:11

M_1100


1 Answers

Disclaimer: I've never edited an Arabic file.

How did you export the .pdf contents to Excel?

You cannot directly open a .pdf file neither with Word/Excel/Wordpad nor Notepad, that strange encoding you're seeing most probably is the specific encoding of a selected font resource.

You can use this this tool to detect the encoding

but I really advise you to read the bare minimum about Unicode and Character Sets

From then on, considering the amount of files involved, a good solution seems to be PyODConverter

For a smaller amount of files, Free PDF to Word Converter will take care of your needs:

like image 85
Joao Figueiredo Avatar answered Mar 06 '23 16:03

Joao Figueiredo