Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to grab an arbitrary chunk from a file on Unix/Linux [duplicate]

Tags:

bash

shell

unix

I'm trying to copy a chunk from one binary file into a new file. I have the byte offset and length of the chunk I want to grab.

I have tried using the dd utility, but this seems to read and discard the data up to the offset, rather than just seeking (I guess because dd is for copying/converting blocks of data). This makes it quite slow (and slower the higher the offset). This is the command I tried:

dd if=inputfile ibs=1 skip=$offset count=$datalength of=outputfile

I guess I could write a small Perl/Python/whatever script to open the file, seek to the offset, then read and write the required amount of data in chunks.

Is there a utility that supports something like this?

like image 266
kevinm Avatar asked Aug 13 '09 15:08

kevinm


People also ask

How do you copy the contents of a file in Unix?

To copy files and directories use the cp command under a Linux, UNIX-like, and BSD like operating systems. cp is the command entered in a Unix and Linux shell to copy a file from one place to another, possibly on a different filesystem.

How do I copy the whole content of a file in Linux?

To copy a file in a terminal, you use the cp command, which works exactly like the mv command, except that it duplicates the contents of a file rather than moving them from one location to another. As with the mv command, you can rename a file while copying it.

How do I split a large file into multiple smaller pieces in Linux?

To split a file into pieces, you simply use the split command. By default, the split command uses a very simple naming scheme. The file chunks will be named xaa, xab, xac, etc., and, presumably, if you break up a file that is sufficiently large, you might even get chunks named xza and xzz.

How do you split a large file in Unix?

If you use the -l (a lowercase L) option, replace linenumber with the number of lines you'd like in each of the smaller files (the default is 1,000). If you use the -b option, replace bytes with the number of bytes you'd like in each of the smaller files.


3 Answers

You can use tail -c+N to trim the leading N bytes from input, then you can use head -cM to output only the first M bytes from its input.

$ echo "hello world 1234567890" | tail -c+9 | head -c6
rld 12

So using your variables, it would probably be:

tail -c+$offset inputfile | head -c$datalength > outputfile


Ah, didn't see it had to seek. Leaving this as CW.
like image 129
Mark Rushakoff Avatar answered Jan 02 '23 17:01

Mark Rushakoff


Yes it's awkward to do this with dd today. We're considering adding skip_bytes and count_bytes params to dd in coreutils to help. The following should work though:

#!/bin/sh

bs=100000
infile=$1
skip=$2
length=$3

(
  dd bs=1 skip=$skip count=0
  dd bs=$bs count=$(($length / $bs))
  dd bs=$(($length % $bs)) count=1
) < "$infile"
like image 40
pixelbeat Avatar answered Jan 02 '23 15:01

pixelbeat


Thanks for the other answers. Unfortunately, I'm not in a position to install additional software, so the ddrescue option is out. The head/tail solution is interesting (I didn't realise you could supply + to tail), but scanning through the data makes it quite slow.

I ended up writing a small python script to do what I wanted. The buffer size should probably be tuned to be the same as some external buffer setting, but using the value below is performant enough on my system.

#!/usr/local/bin/python

import sys

BUFFER_SIZE = 100000

# Read args
if len(sys.argv) < 4:
    print >> sys.stderr, "Usage: %s input_file start_pos length" % (sys.argv[0],)
    sys.exit(1)
input_filename = sys.argv[1]
start_pos = int(sys.argv[2])
length = int(sys.argv[3])

# Open file and seek to start pos
input = open(sys.argv[1])
input.seek(start_pos)

# Read and write data in chunks
while length > 0:
    # Read data
    buffer = input.read(min(BUFFER_SIZE, length))
    amount_read = len(buffer)

    # Check for EOF
    if not amount_read:
        print >> sys.stderr, "Reached EOF, exiting..."
        sys.exit(1)

    # Write data
    sys.stdout.write(buffer)
    length -= amount_read
like image 22
kevinm Avatar answered Jan 02 '23 17:01

kevinm