Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to auto detect the encoding of srt subtitle file

Tags:

encoding

I have a product here that have weakness in auto detect the encoding of srt subtitle files compared to competitor. I can auto detect the encoding for smi files, since it has language info in its header. But for srt, i cannot do that. How can I apply this auto detect for srt files? Any good references for example about the algorithm that I can learn as my first step would be appreciated. Fyi, my product should support Western Europe, Central Europe, Cyrillic Alphabet, Greek, Turkish, Hebrew, Arabic, Baltic, Korean, S-Chinese, T-Chinese, Vietnam, Thai.

like image 816
kalingga Avatar asked Dec 12 '25 05:12

kalingga


1 Answers

There is plenty of tools to detect the charset of a text file (e.g. srt files). For example, in the command line of a Linux machine you can use chardet:

chardet subtile_file_name.srt

This utility should be previously installed with pip (Python installer). In Ubuntu:

sudo apt-get install python-pip
pip install chardet

If you need to integrate a detector in your application, there is also open libraries to do the job. For example, in my tool DualSub which is implemented in Java, I used juniversalchardet.

like image 66
Boni García Avatar answered Dec 14 '25 16:12

Boni García



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!