Below you'll see a small excerpt of matches from the string 'octeon' in a 32b memory dump from a proprietary routing device. As you can see it contains some adjusted ASCII extending to 16 characters from the end of the line, then four 32-bit words (8 characters each, of course), then the address offset.
000b27a0: 41646a75 7374206f 6374656f 6e5f6970 Adjust octeon_ip
000b2850: 73740a00 00000000 6f637465 6f6e5f72 st......octeon_r
000b2870: 5f73697a 65000000 6f637465 6f6e5f72 _size...octeon_r
000b2990: 6164696e 672e0a00 6f637465 6f6e5f72 ading...octeon_r
000b29b0: 785f7369 7a650000 6f637465 6f6e5f72 x_size..octeon_r
000b3050: 780a0000 00000000 6f637465 6f6e5f70 x.......octeon_p
000b3650: 6564204f 6374656f 6e206d6f 64656c0a ed Octeon model.
000bade0: 20307825 71780a00 6f637465 6f6e5f6c 0x%qx..octeon_l
000bafd0: 696e6720 4f637465 6f6e2045 78656375 ing Octeon Execu
000bd710: 6564204f 6374656f 6e204d6f 64656c21 ed Octeon Model!
000bd950: 4f435445 4f4e2070 61737320 3120646f OCTEON pass 1 do
000bda20: 6564206f 6374656f 6e206d6f 64656c3a ed octeon model:
While that data contains some useful information, tragically, the operating system (HiveOS) makes no attempt to allocate memory contiguously or to coalesce disparate heaps (and why should they?), so the vast majority of memory is a barren yet-to-be-malloc'd heap.
0004d6b0: 00000000 00000000 00000000 00000000 ................
0004d6c0: 00000000 00000000 00000000 00000000 ................
0004d6d0: 00000000 00000000 00000000 00000000 ................
0004d6e0: 00000000 00000000 00000000 00000000 ................
0004d6f0: 00000000 00000000 00000000 00000000 ................
0004d700: 00000000 00000000 00000000 00000000 ................
0004d710: 00000000 00000000 00000000 00000000 ................
0004d720: 00000000 00000000 00000000 00000000 ................
0004d730: 00000000 00000000 00000000 00000000 ................
0004d740: 00000000 00000000 00000000 00000000 ................
0004d750: 00000000 00000000 00000000 00000000 ................
I'd like to quickly and efficiently pull out strings of a certain size matching some arbitrary regular expression pattern ([a-zA-z]
comes to mind)
You might naturally think that running the perennial object dump examination favorite 'strings' would yield a result, but the md
util is a cruel mistress -- due to the presence of ascii coded hexadecimal banks & addresses, it identifies every line as containing a 'string'.
Sure, we all know there exists a trivial scripting solution (for line in hexdump: f.write(line[-16:])
+ grep '[A-z]' f
).
However, sometimes I'm struck with the feeling that I should come to understand these dastardly oppressive, yet misunderstood regular expressions better, rather than slinking back to my easy to use newfangled programmin' languages. I really feel I can't start growing a real Unix neckbeard until I've completely replaced my entire development toolchain life with various stream editor and Awk script's regular expressions.
How does one match [a-zA-z]
within a certain numbers of characters from the end of line (In my case, 16) -- it seems like a pretty pithy construction but all combination of +, ? {16} and otherwise that made sense to me in the past few minutes have promptly failed.
The ‹ ^ › and ‹ $ › anchors ensure that the regex matches the entire subject string; otherwise, it could match 10 characters within longer text. The ‹ [A-Z] › character class matches any single uppercase character from A to Z, and the interval quantifier ‹ {1,10} › repeats the character class from 1 to 10 times.
To match the start or the end of a line, we use the following anchors: Caret (^) matches the position before the first character in the string. Dollar ($) matches the position right after the last character in the string.
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
$ means "Match the end of the string" (the position after the last character in the string).
Use the "non-matching" switch -v
:
grep -v \.{16}$
This will strip out all lines ending with 16 dots.
Here's the man
documentation for it:
-v, --invert-match
Invert the sense of matching, to select non-matching lines.
Does this do what you want? ".{16}$
"
That will match any 16 characters from the end of the line. The $
ensures it matches the end of the line.
After closer inspection, if you want to extract only the lines that are not all periods, you could use this regex: "{4}(.*?\w.*?)$
" There is a space before the {4}
so that it matches the delimiter between the digits and the end of the line. It's not technically "only 16 characters," but given the data set, it does appear to provide the desired output. (Assuming the desired output is any line that has a word character in it, which is letters/numbers/underscore.)
A cheap trick to filter interesting lines is to fill selection with any character until end of line. Here I select a character which is not a point and which is no further than 15 character from the end of line. (You use posix regex so you should write the repetition quantifier between \{ \} and not { } )
grep '[^.].\{1,15\}$'
Then you can pipe result with another grep to test, or you can adapt the idea to another regex:
grep 'abc.\{1,13\}$'
will mach string "abc" in the 16 last characters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With