Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Skipping chunk of a big file with bash

Tags:

grep

bash

awk

Using bash, how to awk/grep from the middle of a given file and skip 1Gig for instance? In other words, I don't want awk/grep to search through the first 1Gig of the file but want to start my search in the middle of the file.

like image 488
Raph the raph Avatar asked Oct 31 '25 01:10

Raph the raph


1 Answers

You can use dd like this:

# make a 10GB file of zeroes
dd if=/dev/zero bs=1G count=10 > file

# read it, skipping first 9GB and count what you get
dd if=file bs=1G skip=9 | wc -c
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.84402 s, 582 MB/s
1073741824

Note that I am just demonstrating a concept of how easily you can skip 9GB. In practice, you may prefer to use a 100MB memory buffer and skip 90 of them, rather than allocating a whole GB. So, in practice, you might prefer:

dd if=file bs=100M skip=90 | wc -c

Note also that I am piping to wc rather than awk because my test data is not line oriented - it is just zeros.

Or, if your record size is 30kB and you want to skip a million records and discard diagnostic output:

dd if=file bs=30K skip=1000000 2> /dev/null | awk ...

Note that:

  • your line numbers will be "wrong" in awk (because awk didn't "see" them), and
  • your first line may be incomplete (because dd isn't "line oriented") but I guess that doesn't matter.

Note also, that it is generally very advantageous to use a large block size. So, if you want 8MB, you will do much better with bs=1m count=8 than with bs=8 count=1000000 which will cause a million writes of 8 bytes each.


Note also, that if you like processing very large files, you can get GNU Parallel to divide them up for processing in parallel by multiple subprocesses. So, for example, the following code takes the 10GB file we made at the start and starts 10 parallel jobs counting the bytes in each 1GB chunk:

parallel -a file --recend "" --pipepart --block 1G wc -c
like image 124
Mark Setchell Avatar answered Nov 01 '25 18:11

Mark Setchell



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!