Linux: Using split on limited space

Question

I have a huge file on a linux machine. The file is ~20GB and the space on my box is ~25GB. I want to split the file into ~100mb parts. I know theres a 'split' command but that keeps the original file. I don't have enough space to keep the original. Any ideas on how this can be acomplished? I'll even work with any node modules if they make the task easier than bash.

lcd047 · Accepted Answer

My attempt:

#! /bin/bash

if [ $# -gt 2 -o $# -lt 1 -o ! -f "$1" ]; then
    echo "Usage: ${0##*/} <filename> [<split size in M>]" >&2
    exit 1 
fi

bsize=${2:-100}
bucket=$( echo $bsize '* 1024 * 1024' | bc )
size=$( stat -c '%s' "$1" )
chunks=$( echo $size / $bucket | bc )
rest=$( echo $size % $bucket | bc )
[ $rest -ne 0 ] && let chunks++

while [ $chunks -gt 0 ]; do
    let chunks--
    fn=$( printf '%s_%03d.%s' "${1%.*}" $chunks "${1##*.}" )
    skip=$(( bsize * chunks ))
    dd if="$1" of="$fn" bs=1M skip=${skip} || exit 1 
    truncate -c -s ${skip}M "$1" || exit 1 
done

The above assumes bash(1), and Linux implementations of stat(1), dd(1), and truncate(1). It should be pretty much as fast as it gets, since it uses dd(1) to copy chunks of the initial file. It also uses bc(1) to make sure arithmetic operations in the 20GB range don't overflow anything. However, the script was only tested on smaller files, so double check it before running it against your data.

Serkan · Answer

You can use tail and truncate in a shell script to split a file in place, while destroying the original file. We are splitting the file in place backwards so that we can use the truncate. Here is a sample Bash script:

#!/bin/bash

if [ -z "$2" ]; then
   echo "Usage: insplit.sh <splitsize> <filename>"
   exit 1
fi

FILE="$2"
SPLITSIZE="$1"

FILESIZE=`stat -c '%s' $FILE`
BLOCKCOUNT=$(( (FILESIZE+SPLITSIZE-1)/SPLITSIZE ))
echo "Split count: $BLOCKCOUNT"

BLOCKCOUNT=$(($BLOCKCOUNT-1))
while [ $BLOCKCOUNT -ge 0 ]; do
  FNAME="$FILE.$BLOCKCOUNT"
  echo "writing $FNAME"
  OFFSET=$((BLOCKCOUNT * SPLITSIZE))
  BLOCKSIZE=$(( $FILESIZE - $OFFSET))
  tail -c "$BLOCKSIZE" $FILE > $FNAME
  truncate -s $OFFSET $FILE
  FILESIZE=$((FILESIZE-BLOCKSIZE))
  BLOCKCOUNT=$(( $BLOCKCOUNT-1 ))
done

I confirmed the results with a random file:

$ dd if=/dev/urandom of=largefile bs=512 count=1000
$ md5sum largefile
7ff913b62ef572265661a85f06417746  largefile
$ ./insplit.sh 200000 largefile
Split count: 3
writing largefile.2
writing largefile.1
writing largefile.0
$ cat largefile.0 largefile.1 largefile.2 | md5sum
7ff913b62ef572265661a85f06417746  -

Linux: Using split on limited space

Tags:

linux

bash

file-io

Light

2 Answers

lcd047

Serkan

Recent Activity

Donate For Us

Linux: Using split on limited space

Tags:

linux

bash

file-io

Light

2 Answers

lcd047

Serkan

Related questions

Recent Activity

Donate For Us