I run a Python script, that generates a string and then execute a shell script using that string. I want to check the encoding of that string using linux shell but without writing that string in file (disk operations runs slowly). Is it possible to check an encoding of string in Linux (Ubuntu) using only RAM? Something like:
check-encoding 'My string with random encoding'
Python check encoding script is slow too.
Check your file encoding. In order to check the current file encoding, use the command below, replacing <filename> by the desired file. Example: Convert your file encoding. Now that you already know the encoding of your file, you can convert your source file to a new one with the desired encoding.
Is it possible to force the shell (bash or sh) to detect the correct script encoding? ( something similar to the python or ruby encoding cookie) The solution should aim to better portability, so it is not necessary to stick with bash. EDIT: maybe i've found a possible solution using a recursive script call:
If you're talking about XML files (ISO-8859-1), the XML declaration inside them specifies the encoding: <?xml version="1.0" encoding="ISO-8859-1" ?> So, you can use regular expressions (e.g., with Perl) to check every file for such specification. More information can be found here: How to Determine Text File Encoding.
To have a more accurate result, you can use all possible encodings via: mb_list_encodings () you can list all files in a directory and subdirectories and the corresponding encoding. Remember it'll change your current Bash session interpreter for "spaces".
Try file utility. You can pass any string as file argument to file by using echo
piped to utility with -
option (many commands use a hyphen (-) in place of a filename as an argument to indicate when the input should come from stdin rather than a file):
:~ $ echo "test" | file -i -
/dev/stdin: text/plain; charset=us-ascii
:~ $ echo "тест" | file -i -
/dev/stdin: text/plain; charset=utf-8
with pipe to sed:
:~ $ echo "тест" | file -i - | sed 's/.*charset=\(.*\)/\1/'
utf-8
or to awk (you can mix it of course):
:~ $ echo "тест" | file -i - | awk '{ print $3 }'
charset=utf-8
also you can use python chardet module. Chardet comes with a command-line script which reports on the encodings of one or more files. Just install it with:
pip install chardet
and use with pipe from echo:
:~ $ echo "тест" | chardetect
<stdin>: utf-8 with confidence 0.938125
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With