Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is my pdf file encoded in UTF-8?

I would like to find out, if a pdf file is encoded in UTF-8. How to check, which caracter encoding is used in a pdf file?

like image 913
Ronald Avatar asked Feb 08 '17 14:02

Ronald


People also ask

How can I tell if a file is UTF-8 encoded?

Open the file in Notepad. Click 'Save As...'. In the 'Encoding:' combo box you will see the current file format. Yes, I opened the file in notepad and selected the UTF-8 format and saved it.

What encoding are PDF files?

PDF files are either 8-bit binary files or 7-bit ASCII text files (using ASCII-85 encoding). Every line in a PDF can contain up to 255 characters.

How do I know if my file is UTF-16 or UTF-8?

There are a few options you can use: check the content-type to see if it includes a charset parameter which would indicate the encoding (e.g. Content-Type: text/plain; charset=utf-16 ); check if the uploaded data has a BOM (the first few bytes in the file, which would map to the unicode character U+FEFF - 2 bytes for ...


1 Answers

A PDF is a binary file, not a text file.

A character encoding like "UTF-8" makes only sense in context with text files (*.txt, *.html, *.xml, *.csv, ...).

Thus, a PDF never is UTF-8 encoded.

like image 58
mkl Avatar answered Oct 16 '22 03:10

mkl