Say I have a bunch of XML files which contain no newlines, but basically contain a long list of records, delimited by </record><record>
If the delimiter were </record>\n<record>
I would be able to do something like cat *.xml | grep xyz | wc -l
to count instances of records of interest, because cat would emit the records one per line.
Is there a way to write SOMETHING *.xml | grep xyz | wc -l
where SOMETHING
can stream out the records one per line? I tried using awk
for this but couldn't find a way to avoid streaming the whole file into memory.
Hopefully the question is clear enough :)
This is a little ugly, but it works:
sed 's|</record>|</record>\
|g' *.xml | grep xyz | wc -l
(Yes, I know I could make it a little bit shorter, but only at the cost of clarity.)
If your record body has no character like <
or /
or >
, then you may try this:
grep -E -o 'SEARCH_STRING[^<]*</record>' *.xml| wc -l
or
grep -E -o 'SEARCH_STRING[^/]*/record>' *.xml| wc -l
or
grep -E -o 'SEARCH_STRING[^>]*>' *.xml| wc -l
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With