Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linux: Using split on limited space

I have a huge file on a linux machine. The file is ~20GB and the space on my box is ~25GB. I want to split the file into ~100mb parts. I know theres a 'split' command but that keeps the original file. I don't have enough space to keep the original. Any ideas on how this can be acomplished? I'll even work with any node modules if they make the task easier than bash.

like image 669
Light Avatar asked Jun 16 '15 03:06

Light


2 Answers

My attempt:

#! /bin/bash

if [ $# -gt 2 -o $# -lt 1 -o ! -f "$1" ]; then
    echo "Usage: ${0##*/} <filename> [<split size in M>]" >&2
    exit 1 
fi

bsize=${2:-100}
bucket=$( echo $bsize '* 1024 * 1024' | bc )
size=$( stat -c '%s' "$1" )
chunks=$( echo $size / $bucket | bc )
rest=$( echo $size % $bucket | bc )
[ $rest -ne 0 ] && let chunks++

while [ $chunks -gt 0 ]; do
    let chunks--
    fn=$( printf '%s_%03d.%s' "${1%.*}" $chunks "${1##*.}" )
    skip=$(( bsize * chunks ))
    dd if="$1" of="$fn" bs=1M skip=${skip} || exit 1 
    truncate -c -s ${skip}M "$1" || exit 1 
done

The above assumes bash(1), and Linux implementations of stat(1), dd(1), and truncate(1). It should be pretty much as fast as it gets, since it uses dd(1) to copy chunks of the initial file. It also uses bc(1) to make sure arithmetic operations in the 20GB range don't overflow anything. However, the script was only tested on smaller files, so double check it before running it against your data.

like image 68
lcd047 Avatar answered Oct 18 '22 16:10

lcd047


You can use tail and truncate in a shell script to split a file in place, while destroying the original file. We are splitting the file in place backwards so that we can use the truncate. Here is a sample Bash script:

#!/bin/bash

if [ -z "$2" ]; then
   echo "Usage: insplit.sh <splitsize> <filename>"
   exit 1
fi

FILE="$2"
SPLITSIZE="$1"

FILESIZE=`stat -c '%s' $FILE`
BLOCKCOUNT=$(( (FILESIZE+SPLITSIZE-1)/SPLITSIZE ))
echo "Split count: $BLOCKCOUNT"

BLOCKCOUNT=$(($BLOCKCOUNT-1))
while [ $BLOCKCOUNT -ge 0 ]; do
  FNAME="$FILE.$BLOCKCOUNT"
  echo "writing $FNAME"
  OFFSET=$((BLOCKCOUNT * SPLITSIZE))
  BLOCKSIZE=$(( $FILESIZE - $OFFSET))
  tail -c "$BLOCKSIZE" $FILE > $FNAME
  truncate -s $OFFSET $FILE
  FILESIZE=$((FILESIZE-BLOCKSIZE))
  BLOCKCOUNT=$(( $BLOCKCOUNT-1 ))
done

I confirmed the results with a random file:

$ dd if=/dev/urandom of=largefile bs=512 count=1000
$ md5sum largefile
7ff913b62ef572265661a85f06417746  largefile
$ ./insplit.sh 200000 largefile
Split count: 3
writing largefile.2
writing largefile.1
writing largefile.0
$ cat largefile.0 largefile.1 largefile.2 | md5sum
7ff913b62ef572265661a85f06417746  -
like image 30
Serkan Avatar answered Oct 18 '22 16:10

Serkan