Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can i use regex in the Record Separator in awk in linux

Tags:

linux

bash

awk

I have the test file like this

fdsf fdsf fdsfds fdsf
fdsfdsfsdf fdsfsf
fsdfsdf var12=1343243432

fdsf fdsf fdsfds fdsf
fdsfsdfdsfsdf
fsdfsdf var12=13432434432

fdsf fdsf fdsfds fdsf
fsdfsdf fdsfsf var12=13443432432

Now i want to use var12=\d+ as the record separator. Is this possible in awk

like image 333
user2024264 Avatar asked Feb 07 '13 02:02

user2024264


People also ask

What is record separator in awk?

The awk utility divides the input for your awk program into records and fields. Records are separated by a character called the record separator. By default, the record separator is the newline character. This is why records are, by default, single lines.

What is RT in regex?

. Matches any single character. For example the regular expression r.t matches the strings rat , rut , r t , but not root . $ Matches the end of a line.

What is RT in awk?

When RS is a single character, RT contains the same single character. However, when RS is a regular expression, RT contains the actual input text that matched the regular expression. If the input file ends without any text matching RS , gawk sets RT to the null string.


2 Answers

Yes, however you should use [0-9] instead of \d:

awk '1' RS="var12=[0-9]+" file

IIRC, only GNU awk can use multi-character record separators.

Results:

fdsf fdsf fdsfds fdsf
fdsfdsfsdf fdsfsf
fsdfsdf 


fdsf fdsf fdsfds fdsf
fdsfsdfdsfsdf
fsdfsdf 


fdsf fdsf fdsfds fdsf
fsdfsdf fdsfsf 

Please post your desired output if you need further assistance.

like image 106
Steve Avatar answered Nov 15 '22 08:11

Steve


Assuming GNU awk (a.k.a. gawk) on Linux, yes.

RS

This is awk's input record separator. Its default value is a string containing a single newline character, which means that an input record consists of a single line of text. It can also be the null string, in which case records are separated by runs of blank lines. If it is a regexp, records are separated by matches of the regexp in the input text.

Source: 7.5.1 Built-in Variables That Control awk, The GNU Awk User's Guide.

As @steve says, \d is not in the list of Regular Expression Operators or gawk-Specific Regexp Operators, so you need to use a bracket expression such as [0-9] or [[:digit:]] in place of your \d.

However, it's not clear from your question as to what your intention here is. I've answered your question but I doubt I've solved your underlying problem. See also What is the XY problem?

like image 31
Johnsyweb Avatar answered Nov 15 '22 08:11

Johnsyweb