Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using cut in bash on a file with a unique deliminter

Can cut be used in bash with the ¬ delimiter?

This question is an extension of the topic covered here. One interpretation of the goal in that link is to use a delimiter that can not be found (or very rarely found) in human text. Say we choose the 'Not Sign' (¬) as a delimiter. My question is regarding the use of cut to pull specific columns of a file with said delimiter.

For example, say that we create a file with the ¬ delimiter. The file prac.txt might look like:

$cat prac.txt
"Billy""Car"¬"Red"¬"Garage"¬"3"
"Rob"¬"Truck"¬"Blue"¬"Street"¬"14" 

The following process produces an error:

$cut -d'¬' -f1 prac.txt  
cut: the delimiter must be a single character
Try `cut --help' for more information.

The correct output would be:

"Billy"
"Rob"

Possibly useful info from python:

import unicodedata
>>>unicodedata.lookup('Not sign')
u'\xac'

Possibly useful character conversion link.

My guess is that the -d flag uses some representation of '¬' that I have not tried yet or else it only works with single ascii characters. Thanks in advance for any help.

like image 976
blehman Avatar asked Mar 21 '23 18:03

blehman


1 Answers

In UTF-8, the "not sign" is encoded in two bytes c2 ac. and cut doesn't handle this, which is arguably a bug. See this discussion on unix.stackexchange.

like image 82
user2719058 Avatar answered Mar 24 '23 07:03

user2719058