Using bash, how to awk/grep from the middle of a given file and skip 1Gig for instance? In other words, I don't want awk/grep to search through the first 1Gig of the file but want to start my search in the middle of the file.
You can use dd like this:
# make a 10GB file of zeroes
dd if=/dev/zero bs=1G count=10 > file
# read it, skipping first 9GB and count what you get
dd if=file bs=1G skip=9 | wc -c
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.84402 s, 582 MB/s
1073741824
Note that I am just demonstrating a concept of how easily you can skip 9GB. In practice, you may prefer to use a 100MB memory buffer and skip 90 of them, rather than allocating a whole GB. So, in practice, you might prefer:
dd if=file bs=100M skip=90 | wc -c
Note also that I am piping to wc rather than awk because my test data is not line oriented - it is just zeros.
Or, if your record size is 30kB and you want to skip a million records and discard diagnostic output:
dd if=file bs=30K skip=1000000 2> /dev/null | awk ...
Note that:
awk (because awk didn't "see" them), anddd isn't "line oriented") but I guess that doesn't matter.Note also, that it is generally very advantageous to use a large block size. So, if you want 8MB, you will do much better with bs=1m count=8 than with bs=8 count=1000000 which will cause a million writes of 8 bytes each.
Note also, that if you like processing very large files, you can get GNU Parallel to divide them up for processing in parallel by multiple subprocesses. So, for example, the following code takes the 10GB file we made at the start and starts 10 parallel jobs counting the bytes in each 1GB chunk:
parallel -a file --recend "" --pipepart --block 1G wc -c
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With