I have a file with some statistics like this
2023-01-01 01:00:00 TOTAL MEMORY ALLOCATION CONSUMPTION:
2023-01-01 01:00:00 COMPONENT | USAGE (%)
2023-01-01 01:00:00 class.zzz.aaa.bbb | 32
2023-01-01 01:00:00 class.fff.aaa.ggg | 20
2023-01-01 01:00:00 TOTAL: 52% out of 100% allocated memory consumed
2023-01-01 01:00:00 TOTAL MEMORY ALLOCATION CONSUMPTION:
2023-01-02 01:00:00 COMPONENT | USAGE (%)
2023-01-02 01:00:00 class.xxx.aaa.bbb | 42
2023-01-02 01:00:00 class.bbb.aaa.zzz | 10
2023-01-02 01:00:00 class.zzz.xxx | 21
2023-01-02 01:00:00 class.xxx.sss.ggg | 5
2023-01-02 01:00:00 TOTAL: 78% out of 100% allocated memory consumed
2023-01-01 01:00:00 TOTAL MEMORY ALLOCATION CONSUMPTION:
2023-01-03 01:00:00 COMPONENT | USAGE (%)
2023-01-03 01:00:00 class.xxx.yyy.zzz | 10
2023-01-03 01:00:00 class.xxx.zzz.aaa | 20
2023-01-03 01:00:00 class.zzz.aaa.bbb | 30
2023-01-03 01:00:00 TOTAL: 60% out of 100% allocated memory consumed
and I would like to cut out the last set of statistics (in the example above it would be the last 6 lines). As you can see, the amount of lines for each section can change, but the first and the last line stay constant. I was thinking about using:
I ended up with this regex (?m)^.*?TOTAL(?s).*?(?m)TOTAL.*?$
and to use it in Linux, I used this command to get the wanted output using -P
regex extension for grep (I haven't had much luck with -E
regex extension)
tac con.log | grep -Po "(?m)^.*?TOTAL(?s).*?(?m)TOTAL.*?\$" -m1 | tac
which resulted in this correct output
2023-01-01 01:00:00 TOTAL MEMORY ALLOCATION CONSUMPTION:
2023-01-03 01:00:00 COMPONENT | USAGE (%)
2023-01-03 01:00:00 class.xxx.yyy.zzz | 10
2023-01-03 01:00:00 class.xxx.zzz.aaa | 20
2023-01-03 01:00:00 class.zzz.aaa.bbb | 30
2023-01-03 01:00:00 TOTAL: 60% out of 100% allocated memory consumed
as expected, however this was in my testing environment which uses an old grep version 2.5.3
and when I tried it on my other machine running on Rocky Linux 9, which uses grep version 3.6
I am not getting any match. Considering this regex worked also when testing at regex101.com, I believe this might be a nuance of a newer grep. Is there anything special these newer versions of grep require for a regex like this to work or is there any other way how to get this result (ultimately, it will be used in a bash script)?
With Perl,† one way
perl -0777 -wnE'$r = $1 while /(^[0-9\s:-]+TOTAL.+? TOTAL.+?$)/smxg; say $r' file
or
perl -0777 -wnE'say for /.*( ^[0-9\s:-]+ TOTAL.+? TOTAL.+?$ )/smxg' file
This does capture and assign all such records, or matches the whole file, until it gets to the last one, but one has to go over the file; the approach from the question makes three passes over the file. We can process backwards if performance is an issue, like here for example. See the performance effect here.
Altogether I'd recommend a short script instead.
Not sure why grep
does what you show; I'd imagine that the above regex should work, even slightly simplified using grep's conventions.
† In the question as originally posted by the OP there was a perl
tag.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With