Skipping chunk of a big file with bash

Question

Using bash, how to awk/grep from the middle of a given file and skip 1Gig for instance? In other words, I don't want awk/grep to search through the first 1Gig of the file but want to start my search in the middle of the file.

Mark Setchell · Accepted Answer

You can use dd like this:

# make a 10GB file of zeroes
dd if=/dev/zero bs=1G count=10 > file

# read it, skipping first 9GB and count what you get
dd if=file bs=1G skip=9 | wc -c
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.84402 s, 582 MB/s
1073741824

Note that I am just demonstrating a concept of how easily you can skip 9GB. In practice, you may prefer to use a 100MB memory buffer and skip 90 of them, rather than allocating a whole GB. So, in practice, you might prefer:

dd if=file bs=100M skip=90 | wc -c

Note also that I am piping to wc rather than awk because my test data is not line oriented - it is just zeros.

Or, if your record size is 30kB and you want to skip a million records and discard diagnostic output:

dd if=file bs=30K skip=1000000 2> /dev/null | awk ...

Note that:

your line numbers will be "wrong" in awk (because awk didn't "see" them), and
your first line may be incomplete (because dd isn't "line oriented") but I guess that doesn't matter.

Note also, that it is generally very advantageous to use a large block size. So, if you want 8MB, you will do much better with bs=1m count=8 than with bs=8 count=1000000 which will cause a million writes of 8 bytes each.

Note also, that if you like processing very large files, you can get GNU Parallel to divide them up for processing in parallel by multiple subprocesses. So, for example, the following code takes the 10GB file we made at the start and starts 10 parallel jobs counting the bytes in each 1GB chunk:

parallel -a file --recend "" --pipepart --block 1G wc -c

Skipping chunk of a big file with bash

Tags:

grep

bash

awk

Raph the raph

1 Answers

Mark Setchell

Recent Activity

Donate For Us

Skipping chunk of a big file with bash

Tags:

grep

bash

awk

Raph the raph

1 Answers

Mark Setchell

Related questions

Recent Activity

Donate For Us