Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split delimited file into smaller files by column

I'm familiar with the split command in linux. If I have a file that's 100 lines long,

split -l 5 myfile.txt

...will split myfile.txt into 20 files, each having 5 lines, and will write them to file.

My question is, I want to do this by column. Given a file with 100 columns, tab delimited, is there a similar command to split this file into 20 smaller files, each having 5 columns and all the rows?

I'm aware of how to use cut, but I'm hoping there's a simple UNIX command I've never heard of that will accomplish this without wrapping cut with perl or something.

Thanks in advance.

like image 333
Stephen Turner Avatar asked Mar 10 '11 20:03

Stephen Turner


3 Answers

#!/bin/bash

(($# == 2)) || { echo -e "\nUsage: $0 <file to split> <# columns in each split>\n\n"; exit; }

infile="$1"

inc=$2
ncol=$(awk 'NR==1{print NF}' "$infile")

((inc < ncol)) || { echo -e "\nSplit size >= number of columns\n\n"; exit; }

for((i=0, start=1, end=$inc; i < ncol/inc + 1; i++, start+=inc, end+=inc)); do
  cut -f$start-$end "$infile" > "${infile}.$i"
done
like image 60
SiegeX Avatar answered Sep 20 '22 17:09

SiegeX


if you only need a QAD (Quick & Dirty) solution for in my case a fixed 8 column ; separated csv

#!/bin/bash
# delimiter is ;
cut -d';' -f1 "$1" > "${1}.1"
cut -d';' -f2 "$1" > "${1}.2"
cut -d';' -f3 "$1" > "${1}.3"
cut -d';' -f4 "$1" > "${1}.4"
cut -d';' -f5 "$1" > "${1}.5"
cut -d';' -f6 "$1" > "${1}.6"
cut -d';' -f7 "$1" > "${1}.7"
cut -d';' -f8 "$1" > "${1}.8"
like image 27
zzapper Avatar answered Sep 19 '22 17:09

zzapper


Thanks for the help. I hoped there would be a unix command similar to split, but I ended up wrapping the cut command with perl, via SiegeX's suggestion.

#!/usr/bin/perl

chomp(my $pwd = `pwd`);
my $help = "\nUsage: $0 <file to split> <# columns in each split>\n\n";
die $help if @ARGV!=2;


$infile = $ARGV[0];
chomp($ncol = `head -n 1 $infile | wc -w`);

$start=1;
$inc = $ARGV[1];
$end = $start+$inc-1;

die "\nSplit size >= number of columns\n\n" if $inc>=$ncol;

for($i=1 ; $i<$ncol/$inc +1 ; $i++) {
    if ($end>$ncol) {$end=$ncol;}
    `cut -f $start-$end $infile > $infile.$i`;
    $start += $inc;
    $end += $inc;
}
like image 31
Stephen Turner Avatar answered Sep 22 '22 17:09

Stephen Turner