Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Could sed or awk use NUL character as record separator?

Tags:

sed

awk

nul

I have a NUL delimited output coming from the following command :

some commands | grep -i -c -w -Z 'some regex'

The output consists of records of the format :

[file name]\0[pattern count]\0

I want to use text manipulation tools, such as sed/awk, to change the records to the following format :

[file name]:[pattern count]\0

But it seems that sed/awk usually handles only records delimited by the "newline" character. I would like to know that how sed/awk could be used to achieve my purpose, or if sed/awk could not handle such case what other Linux tool should I use.

Thanks for any suggestion.

Lawrence

like image 673
user1129812 Avatar asked Feb 07 '12 02:02

user1129812


People also ask

What is record separator in awk?

The awk utility divides the input for your awk program into records and fields. Records are separated by a character called the record separator. By default, the record separator is the newline character. This is why records are, by default, single lines.

How do I add a field separator in awk?

Just put your desired field separator with the -F option in the AWK command and the column number you want to print segregated as per your mentioned field separator.

Should I use sed or awk?

The main difference between sed and awk is that sed is a command utility that works with streams of characters for searching, filtering and text processing while awk more powerful and robust than sed with sophisticated programming constructs such as if/else, while, do/while etc.

What is record separator?

A delimiter, i.e., a character, used to indicate the end of one record or the beginning of the next record. Synonym record separator.


4 Answers

By default, the record separator is the newline character, defining a record to be a single line of text. You can use a different character by changing the built-in variable RS. The value of RS is a string that says how to separate records; the default value is \n, the string containing just a newline character.

 awk 'BEGIN { RS = "/" } ; { print $0 }' BBS-list
like image 100
Tejas Patil Avatar answered Oct 05 '22 16:10

Tejas Patil


Since version 4.2.2, GNU sed has the -z or --null-data option to do exactly this. Eg:

sed -z 's/old/new' null_separated_infile
like image 43
Graeme Avatar answered Oct 05 '22 18:10

Graeme


Yes, gawk can do this, set the record separator to \0. For example the command

gawk 'BEGIN { RS="\0"; FS="=" } $1=="LD_PRELOAD" { print $2 }' </proc/$(pidof mysqld)/environ

Will print out the value of the LD_PRELOAD variable:

/usr/lib/x86_64-linux-gnu/libjemalloc.so.1

The /proc/$PID/environ file is a NUL separated list of environment variables. I'm using it as an example, as it's easy to try on a linux system.

The BEGIN part sets the record separator to \0 and the field separator to = because I also want to extract the part after = based on the part before =.

The $1=="LD_PRELOAD" runs the block if the first field has the key I'm interested in.

The print $2 block prints out the string after =.


But mawk cannot parse input files separated with NUL. This is documented in man mawk:

BUGS
       mawk cannot handle ascii NUL \0 in the source or data files.

mawk will stop reading the input after the first \0 character.


You can also use xargs to handle NUL separated input, a bit non-intuitively, like this:

xargs -0 -n1 </proc/$$/environ

xargs is using echo as the default comand. -0 sets the input to be NUL separated. -n1 sets the max arguments to echo to be 1, this way the output will be separated by newlines.


And as Graeme's answer shows, sed can do this too.

like image 29
Paul Tobias Avatar answered Oct 05 '22 16:10

Paul Tobias


Using sed for removing the null characters -

sed 's/\x0/ /g' infile > outfile

or make in-file substitution by doing (this will make backup of your original file and overwrite your original file with substitutions).

sed -i.bak 's/\x0/ /g' infile

Using tr:

tr -d "\000" < infile > outfile
like image 33
jaypal singh Avatar answered Oct 05 '22 16:10

jaypal singh