Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check in linux shell encoding of string already generated by Python script

I run a Python script, that generates a string and then execute a shell script using that string. I want to check the encoding of that string using linux shell but without writing that string in file (disk operations runs slowly). Is it possible to check an encoding of string in Linux (Ubuntu) using only RAM? Something like:

check-encoding 'My string with random encoding'

Python check encoding script is slow too.

like image 405
Eugene Avatar asked May 05 '15 09:05

Eugene


People also ask

How to check the encoding of a file in Linux?

Check your file encoding. In order to check the current file encoding, use the command below, replacing <filename> by the desired file. Example: Convert your file encoding. Now that you already know the encoding of your file, you can convert your source file to a new one with the desired encoding.

Is it possible to force the shell to detect Script Encoding?

Is it possible to force the shell (bash or sh) to detect the correct script encoding? ( something similar to the python or ruby encoding cookie) The solution should aim to better portability, so it is not necessary to stick with bash. EDIT: maybe i've found a possible solution using a recursive script call:

How to determine the encoding of an XML file?

If you're talking about XML files (ISO-8859-1), the XML declaration inside them specifies the encoding: <?xml version="1.0" encoding="ISO-8859-1" ?> So, you can use regular expressions (e.g., with Perl) to check every file for such specification. More information can be found here: How to Determine Text File Encoding.

How to list all possible file encodings in a directory?

To have a more accurate result, you can use all possible encodings via: mb_list_encodings () you can list all files in a directory and subdirectories and the corresponding encoding. Remember it'll change your current Bash session interpreter for "spaces".


1 Answers

Try file utility. You can pass any string as file argument to file by using echo piped to utility with - option (many commands use a hyphen (-) in place of a filename as an argument to indicate when the input should come from stdin rather than a file):

:~  $ echo "test" | file -i -
/dev/stdin: text/plain; charset=us-ascii

:~  $ echo "тест" | file -i -
/dev/stdin: text/plain; charset=utf-8

with pipe to sed:

:~  $ echo "тест" | file -i - | sed 's/.*charset=\(.*\)/\1/'
utf-8

or to awk (you can mix it of course):

:~  $ echo "тест" | file -i - | awk '{ print $3 }'
charset=utf-8

also you can use python chardet module. Chardet comes with a command-line script which reports on the encodings of one or more files. Just install it with:

pip install chardet

and use with pipe from echo:

:~  $ echo "тест" | chardetect
<stdin>: utf-8 with confidence 0.938125
like image 50
ndpu Avatar answered Nov 15 '22 10:11

ndpu