Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How will cut options -b and -c become different with Internationalization

Tags:

bash

cut

`-b BYTE-LIST'
`--bytes=BYTE-LIST'
     Select for printing only the bytes in positions listed in
     BYTE-LIST.  Tabs and backspaces are treated like any other
     character; they take up 1 byte.  If an output delimiter is
     specified, (see the description of `--output-delimiter'), then
     output that string between ranges of selected bytes.

`-c CHARACTER-LIST'
`--characters=CHARACTER-LIST'
     Select for printing only the characters in positions listed in
     CHARACTER-LIST.  The same as `-b' for now, but
     internationalization will change that.  Tabs and backspaces are
     treated like any other character; they take up 1 character.  If an
     output delimiter is specified, (see the description of
     `--output-delimiter'), then output that string between ranges of
     selected bytes.

Description for -c says that : The same as `-b' for now, but internationalization will change that.

I am assuming Internationalized characters for some languages could have multi-byte characters and that is when -c and -b will behave differently..correct?

like image 570
Ankur Agarwal Avatar asked Sep 16 '25 12:09

Ankur Agarwal


1 Answers

Yes. Let's do a test:

$ cat a
200
bést
203
-Ümlaut
$ cut -b2-3 a
00
é           <---- é has 2 bytes
03
Ü           <---- Ü has 2 bytes
$ cut -c2-3 a
00
és
03
Üm
like image 99
fedorqui 'SO stop harming' Avatar answered Sep 19 '25 07:09

fedorqui 'SO stop harming'