Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split a file using a numeric suffix

People also ask

How do I split a file into two?

To split a file into pieces, you simply use the split command. By default, the split command uses a very simple naming scheme. The file chunks will be named xaa, xab, xac, etc., and, presumably, if you break up a file that is sufficiently large, you might even get chunks named xza and xzz.

What does the split file command do?

Split command in Linux is used to split large files into smaller files. It splits the files into 1000 lines per file(by default) and even allows users to change the number of lines as per requirement.


Since the primary help from GNU split says:

Usage: /usr/gnu/bin/split [OPTION]... [INPUT [PREFIX]]
Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default
size is 1000 lines, and default PREFIX is 'x'.  With no INPUT, or when INPUT
is -, read standard input.

Mandatory arguments to long options are mandatory for short options too.
  -a, --suffix-length=N   generate suffixes of length N (default 2)
      --additional-suffix=SUFFIX  append an additional SUFFIX to file names.
  -b, --bytes=SIZE        put SIZE bytes per output file
  -C, --line-bytes=SIZE   put at most SIZE bytes of lines per output file
  -d, --numeric-suffixes[=FROM]  use numeric suffixes instead of alphabetic.
                                   FROM changes the start value (default 0).
  -e, --elide-empty-files  do not generate empty output files with '-n'
      --filter=COMMAND    write to shell COMMAND; file name is $FILE
  -l, --lines=NUMBER      put NUMBER lines per output file
  -n, --number=CHUNKS     generate CHUNKS output files.  See below
  -u, --unbuffered        immediately copy input to output with '-n r/...'
      --verbose           print a diagnostic just before each
                            output file is opened
      --help     display this help and exit
      --version  output version information and exit

It looks to me like you need to reorganize your options a bit:

split -a 4 -d -l 50000 domains.xml domains_

(From manpage, GNU coreutils 8.21) What you need seems to be -a/--suffix-length=N (generate suffixes of length N (default 2)), not -n/--number=CHUNKS (generate CHUNKS output files)

split -d -l 50000 -a 4 domains.xml domains_

and you should get: domains_0000, domains_0001...


I would use awk. It gives you finer control over your output files and filenames. It should be just ask quick too. Here's how to split a 100 line file into 20 line blocks:

awk 'NR%20==1 { file = FILENAME "_" sprintf("%04d", NR+19) } { print > file }' domains.xml

This should create some files like:

file_0020
file_0040
file_0060
file_0080
file_0100

Adjust accordingly. HTH.


Although you haven't asked for it but I suppose you'd want a proper extension to the resultant files (let us say xml) :

split -d -l 50000 -a 4 --additional-suffix=.xml domains.xml domains_

The --additional-suffix=.xml will make file names of type domains_0000.xml, domains_1453.xml etc.