How to auto detect the encoding of srt subtitle file

Question

I have a product here that have weakness in auto detect the encoding of srt subtitle files compared to competitor. I can auto detect the encoding for smi files, since it has language info in its header. But for srt, i cannot do that. How can I apply this auto detect for srt files? Any good references for example about the algorithm that I can learn as my first step would be appreciated. Fyi, my product should support Western Europe, Central Europe, Cyrillic Alphabet, Greek, Turkish, Hebrew, Arabic, Baltic, Korean, S-Chinese, T-Chinese, Vietnam, Thai.

Boni García · Accepted Answer

There is plenty of tools to detect the charset of a text file (e.g. srt files). For example, in the command line of a Linux machine you can use chardet:

chardet subtile_file_name.srt

This utility should be previously installed with pip (Python installer). In Ubuntu:

sudo apt-get install python-pip
pip install chardet

If you need to integrate a detector in your application, there is also open libraries to do the job. For example, in my tool DualSub which is implemented in Java, I used juniversalchardet.

How to auto detect the encoding of srt subtitle file

Tags:

encoding

kalingga

1 Answers

Boni García

Recent Activity

Donate For Us

How to auto detect the encoding of srt subtitle file

Tags:

encoding

kalingga

1 Answers

Boni García

Related questions

Recent Activity

Donate For Us