How to grab an arbitrary chunk from a file on Unix/Linux [duplicate]

Tags:

I'm trying to copy a chunk from one binary file into a new file. I have the byte offset and length of the chunk I want to grab.

I have tried using the dd utility, but this seems to read and discard the data up to the offset, rather than just seeking (I guess because dd is for copying/converting blocks of data). This makes it quite slow (and slower the higher the offset). This is the command I tried:

dd if=inputfile ibs=1 skip=$offset count=$datalength of=outputfile

I guess I could write a small Perl/Python/whatever script to open the file, seek to the offset, then read and write the required amount of data in chunks.

Is there a utility that supports something like this?

266

asked Aug 13 '09 15:08

kevinm

3 Answers

You can use tail -c+N to trim the leading N bytes from input, then you can use head -cM to output only the first M bytes from its input.

$ echo "hello world 1234567890" | tail -c+9 | head -c6
rld 12

So using your variables, it would probably be:

tail -c+$offset inputfile | head -c$datalength > outputfile

Ah, didn't see it had to seek. Leaving this as CW.

129

answered Jan 02 '23 17:01

Mark Rushakoff

Yes it's awkward to do this with dd today. We're considering adding skip_bytes and count_bytes params to dd in coreutils to help. The following should work though:

#!/bin/sh

bs=100000
infile=$1
skip=$2
length=$3

(
  dd bs=1 skip=$skip count=0
  dd bs=$bs count=$(($length / $bs))
  dd bs=$(($length % $bs)) count=1
) < "$infile"

answered Jan 02 '23 15:01

pixelbeat

Thanks for the other answers. Unfortunately, I'm not in a position to install additional software, so the ddrescue option is out. The head/tail solution is interesting (I didn't realise you could supply + to tail), but scanning through the data makes it quite slow.

I ended up writing a small python script to do what I wanted. The buffer size should probably be tuned to be the same as some external buffer setting, but using the value below is performant enough on my system.

#!/usr/local/bin/python

import sys

BUFFER_SIZE = 100000

# Read args
if len(sys.argv) < 4:
    print >> sys.stderr, "Usage: %s input_file start_pos length" % (sys.argv[0],)
    sys.exit(1)
input_filename = sys.argv[1]
start_pos = int(sys.argv[2])
length = int(sys.argv[3])

# Open file and seek to start pos
input = open(sys.argv[1])
input.seek(start_pos)

# Read and write data in chunks
while length > 0:
    # Read data
    buffer = input.read(min(BUFFER_SIZE, length))
    amount_read = len(buffer)

    # Check for EOF
    if not amount_read:
        print >> sys.stderr, "Reached EOF, exiting..."
        sys.exit(1)

    # Write data
    sys.stdout.write(buffer)
    length -= amount_read

answered Jan 02 '23 17:01

kevinm

Related questions
                            
                                Passing arguments to a command in Bash script with spaces
                            
                                Bash or-equals ||= like Ruby
                            
                                Integer expression expected
                            
                                SVN post-commit hook sending a message back to client
                            
                                Bash eval replacement $() not always equivalent?
                            
                                Redirection of stdout to a file not working
                            
                                standard deviation of an arbitrary number of numbers using bc or other standard utilities
                            
                                Get argument from pipe
                            
                                How to use SDKMAN! to install packages from within scripts
                            
                                <back space> not functional in python and ipython in shell
                            
                                if .bash_profile usually source .bashrc any way, why not just use .bashrc?
                            
                                install make command without already having make (mac os 10.5)
                            
                                How to stop sed from buffering?
                            
                                Bash: Nested backticks in alias cause problems
                            
                                Bash script not waiting on read
                            
                                How to make makefile exit with error when using a loop?
                            
                                Insert multiple lines of text before specific line using Bash
                            
                                unset IFS - unexpected behaviour
                            
                                Passing a command line argument to airflow BashOperator
                            
                                How to create a file with todays date in the filename [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to grab an arbitrary chunk from a file on Unix/Linux [duplicate]

Tags:

bash

shell

unix

kevinm

People also ask

3 Answers

Mark Rushakoff

pixelbeat

kevinm

Recent Activity

Donate For Us