Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pipe one long line as multiple lines

Say I have a bunch of XML files which contain no newlines, but basically contain a long list of records, delimited by </record><record>

If the delimiter were </record>\n<record> I would be able to do something like cat *.xml | grep xyz | wc -l to count instances of records of interest, because cat would emit the records one per line.

Is there a way to write SOMETHING *.xml | grep xyz | wc -l where SOMETHING can stream out the records one per line? I tried using awk for this but couldn't find a way to avoid streaming the whole file into memory.

Hopefully the question is clear enough :)

like image 211
nicolaskruchten Avatar asked Dec 21 '22 11:12

nicolaskruchten


2 Answers

This is a little ugly, but it works:

sed 's|</record>|</record>\
|g' *.xml | grep xyz | wc -l

(Yes, I know I could make it a little bit shorter, but only at the cost of clarity.)

like image 146
Beta Avatar answered Dec 28 '22 07:12

Beta


If your record body has no character like < or / or >, then you may try this:

grep -E -o 'SEARCH_STRING[^<]*</record>' *.xml| wc -l

or

grep -E -o 'SEARCH_STRING[^/]*/record>' *.xml| wc -l

or

grep -E -o 'SEARCH_STRING[^>]*>' *.xml| wc -l
like image 37
Prince John Wesley Avatar answered Dec 28 '22 06:12

Prince John Wesley