Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert text file to lowercase in UNIX (but in UTF-8)

Tags:

linux

unix

I need to convert all text to lowercase, but not using the traditional "tr" command because it does not handle UTF-8 languages properly.

Is there a nice way to do that? I need some UNIX filter so I can process this in a pipe.

like image 816
lzap Avatar asked Sep 24 '10 08:09

lzap


People also ask

How do you change to lowercase in UNIX?

How do I convert uppercase words or strings to a lowercase or vise versa on Unix-like / Linux bash shell? Use the tr command to convert all incoming text / words / variable data from upper to lower case or vise versa (translate all uppercase characters to lowercase).

How do I change the encoding of a file in Unix?

Here is the command to convert character encoding of file using iconv command. In the above command you need to specify the present encoding of file in place of from_encoding and the new encoding of file in place of to_encoding. Here is the command to convert sample. txt from ISO-8859 to UTF-8 format.


2 Answers

Gnu sed should be able to handle unicode. Try

$ echo 'Some StrAngÉ LeTTeRs 123' | sed -e 's/./\L\0/g'
some strangé letters 123
like image 174
aioobe Avatar answered Sep 27 '22 23:09

aioobe


If you can use Python then such code can help you:

import sys
import codecs

utf8input = codecs.getreader("utf-8")(sys.stdin)
utf8output = codecs.getwriter("utf-8")(sys.stdout)

utf8output.write(utf8input.read().lower())

On my Windows machine (sorry :) I can use it as filter:

cat big.txt | python tolowerutf8.py > lower.txt3
like image 35
Michał Niklas Avatar answered Sep 28 '22 01:09

Michał Niklas