I need to convert all text to lowercase, but not using the traditional "tr" command because it does not handle UTF-8 languages properly.
Is there a nice way to do that? I need some UNIX filter so I can process this in a pipe.
How do I convert uppercase words or strings to a lowercase or vise versa on Unix-like / Linux bash shell? Use the tr command to convert all incoming text / words / variable data from upper to lower case or vise versa (translate all uppercase characters to lowercase).
Here is the command to convert character encoding of file using iconv command. In the above command you need to specify the present encoding of file in place of from_encoding and the new encoding of file in place of to_encoding. Here is the command to convert sample. txt from ISO-8859 to UTF-8 format.
Gnu sed should be able to handle unicode. Try
$ echo 'Some StrAngÉ LeTTeRs 123' | sed -e 's/./\L\0/g'
some strangé letters 123
If you can use Python then such code can help you:
import sys
import codecs
utf8input = codecs.getreader("utf-8")(sys.stdin)
utf8output = codecs.getwriter("utf-8")(sys.stdout)
utf8output.write(utf8input.read().lower())
On my Windows machine (sorry :) I can use it as filter:
cat big.txt | python tolowerutf8.py > lower.txt3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With