I'm trying to copy a chunk from one binary file into a new file. I have the byte offset and length of the chunk I want to grab.
I have tried using the dd
utility, but this seems to read and discard the data up to the offset, rather than just seeking (I guess because dd
is for copying/converting blocks of data). This makes it quite slow (and slower the higher the offset). This is the command I tried:
dd if=inputfile ibs=1 skip=$offset count=$datalength of=outputfile
I guess I could write a small Perl/Python/whatever script to open the file, seek to the offset, then read and write the required amount of data in chunks.
Is there a utility that supports something like this?
To copy files and directories use the cp command under a Linux, UNIX-like, and BSD like operating systems. cp is the command entered in a Unix and Linux shell to copy a file from one place to another, possibly on a different filesystem.
To copy a file in a terminal, you use the cp command, which works exactly like the mv command, except that it duplicates the contents of a file rather than moving them from one location to another. As with the mv command, you can rename a file while copying it.
To split a file into pieces, you simply use the split command. By default, the split command uses a very simple naming scheme. The file chunks will be named xaa, xab, xac, etc., and, presumably, if you break up a file that is sufficiently large, you might even get chunks named xza and xzz.
If you use the -l (a lowercase L) option, replace linenumber with the number of lines you'd like in each of the smaller files (the default is 1,000). If you use the -b option, replace bytes with the number of bytes you'd like in each of the smaller files.
You can use tail -c+N
to trim the leading N bytes from input, then you can use head -cM
to output only the first M bytes from its input.
$ echo "hello world 1234567890" | tail -c+9 | head -c6
rld 12
So using your variables, it would probably be:
tail -c+$offset inputfile | head -c$datalength > outputfile
Yes it's awkward to do this with dd today. We're considering adding skip_bytes and count_bytes params to dd in coreutils to help. The following should work though:
#!/bin/sh
bs=100000
infile=$1
skip=$2
length=$3
(
dd bs=1 skip=$skip count=0
dd bs=$bs count=$(($length / $bs))
dd bs=$(($length % $bs)) count=1
) < "$infile"
Thanks for the other answers. Unfortunately, I'm not in a position to install additional software, so the ddrescue option is out. The head/tail solution is interesting (I didn't realise you could supply + to tail), but scanning through the data makes it quite slow.
I ended up writing a small python script to do what I wanted. The buffer size should probably be tuned to be the same as some external buffer setting, but using the value below is performant enough on my system.
#!/usr/local/bin/python
import sys
BUFFER_SIZE = 100000
# Read args
if len(sys.argv) < 4:
print >> sys.stderr, "Usage: %s input_file start_pos length" % (sys.argv[0],)
sys.exit(1)
input_filename = sys.argv[1]
start_pos = int(sys.argv[2])
length = int(sys.argv[3])
# Open file and seek to start pos
input = open(sys.argv[1])
input.seek(start_pos)
# Read and write data in chunks
while length > 0:
# Read data
buffer = input.read(min(BUFFER_SIZE, length))
amount_read = len(buffer)
# Check for EOF
if not amount_read:
print >> sys.stderr, "Reached EOF, exiting..."
sys.exit(1)
# Write data
sys.stdout.write(buffer)
length -= amount_read
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With