Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

split file with output file with numeric suffix but without begin zero

Tags:

linux

bash

split

Suppose I have a file temp.txt with 100 lines. I would like to split into 10 parts. I use following command

split a 1 -l 10 -d temp.txt temp_

But I got temp_0, temp_1, temp_2,...,temp_9. I want output like this temp_1,temp_2,..,temp_10.

From man split I got

-d, --numeric-suffixes
              use numeric suffixes instead of alphabetic

I tried to use split -l 10 --suffix-length=1 --numeric-suffixes=1 Temp.txt temp_

It says split: option '--numeric-suffixes' doesn't allow an argument

Then, I tried to use split -l 10 --suffix-length=1 --numeric-suffixes 1 Temp.txt temp_

It says split: extra operandtemp_'`

The output of split --version is

split (GNU coreutils) 8.4
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Torbj�rn Granlund and Richard M. Stallman.
like image 808
Mike Brown Avatar asked Oct 28 '25 08:10

Mike Brown


2 Answers

I tried to use split -a 1 -l 10 -d 1 Temp.txt temp_. But it shows error split: extra operand temp_' `

-d doesn't have an argument. It should be written as you originally tried;

split -a 1 -l 10 -d Temp.txt temp_

But, forgetting the syntax variations for a moment;

you're asking it to split a 100 line file into 10 parts, with a suffix length of 1, starting at 1.

^- This scenario is erroneous as it is asking the command to process 100 lines and giving it fixed parameters restricting it to processing only 90 lines.

If you're willing to extend your allowable suffix length to 2, then you will at least get a uniform two digit temp file starting at 01;

split -a 1 -l 10 --numeric-suffixes=1 -d Temp.txt temp_ Will create: temp_01 thru temp_10

You can actually negate the -a and -d argument altogether;

split -l 10 --numeric-suffixes=1 Temp.txt temp_ Will also create: temp_01 thru temp_10

If for some reason this was a fixed and absolute requirement or a permanent solution (i.e. integrating to something else you have no control of), and it was always going to be an exactly 100 line file, then you could always do it in two passes;

cat Temp.txt | head -n90 | split -a 1 -l 10 --numeric-suffixes=1 - temp_
cat Temp.txt | tail -n10 | split -a 2 -l 10 --numeric-suffixes=10 - temp_

Then you would get temp_1 thru temp_10

like image 147
hmedia1 Avatar answered Oct 29 '25 23:10

hmedia1


Just to throw out a possible alternative, you can accomplish this task manually by running a couple of loops. The outer loop iterates over the file chunks and the inner loop iterates over the lines within the chunk.

{
    suf=1;
    read -r; rc=$?;
    while [[ $rc -eq 0 || -n "$REPLY" ]]; do
        line=0;
        while [[ ($rc -eq 0 || -n "$REPLY") && line -lt 10 ]]; do
            printf '%s\n' "$REPLY";
            read -r; rc=$?;
            let ++line;
        done >temp_$suf;
        let ++suf;
    done;
} <temp.txt;

Notes:

  • The test $rc -eq 0 || -n "$REPLY" is necessary to continue processing if either we've not yet reached end-of-file (in which case $rc eq 0 is true) or we have reached end-of-file but there was a non-empty final line in the input file (in which case -n "$REPLY" is true). It's good to try to support the case of a non-empty final line with no end-of-line delimiter, which sometimes happens. In this case read will return a failing status code but will still correctly set $REPLY to contain the non-empty final line content. I've tested the split utility and it correctly handles this case as well.
  • By calling read once prior to the outer loop and then once after each print, we ensure that we always test if the read was successful prior to printing the resulting line. A more naïve design might read and print in immediate succession with no check in between, which would be incorrect.
  • I've used the -r option of read to prevent backslash interpolation, which you probably don't want; I assume you want to preserve the contents of temp.txt verbatim.

Obviously there are tradeoffs in this solution. On the one hand, it demands a fair amount of complexity and code verbosity (13 lines the way I've written it). But the advantage is complete control over the behavior of the split operation; you can customize the script to your liking, such as dynamically changing the suffix based on the line number, using a prefix or infix or combination thereof, or even taking into account the contents of the individual file lines in $REPLY.

like image 23
bgoldst Avatar answered Oct 30 '25 00:10

bgoldst



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!