To split a file into pieces, you simply use the split command. By default, the split command uses a very simple naming scheme. The file chunks will be named xaa, xab, xac, etc., and, presumably, if you break up a file that is sufficiently large, you might even get chunks named xza and xzz.
Split command in Linux is used to split large files into smaller files. It splits the files into 1000 lines per file(by default) and even allows users to change the number of lines as per requirement.
Since the primary help from GNU split
says:
Usage: /usr/gnu/bin/split [OPTION]... [INPUT [PREFIX]]
Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default
size is 1000 lines, and default PREFIX is 'x'. With no INPUT, or when INPUT
is -, read standard input.
Mandatory arguments to long options are mandatory for short options too.
-a, --suffix-length=N generate suffixes of length N (default 2)
--additional-suffix=SUFFIX append an additional SUFFIX to file names.
-b, --bytes=SIZE put SIZE bytes per output file
-C, --line-bytes=SIZE put at most SIZE bytes of lines per output file
-d, --numeric-suffixes[=FROM] use numeric suffixes instead of alphabetic.
FROM changes the start value (default 0).
-e, --elide-empty-files do not generate empty output files with '-n'
--filter=COMMAND write to shell COMMAND; file name is $FILE
-l, --lines=NUMBER put NUMBER lines per output file
-n, --number=CHUNKS generate CHUNKS output files. See below
-u, --unbuffered immediately copy input to output with '-n r/...'
--verbose print a diagnostic just before each
output file is opened
--help display this help and exit
--version output version information and exit
It looks to me like you need to reorganize your options a bit:
split -a 4 -d -l 50000 domains.xml domains_
(From manpage, GNU coreutils 8.21) What you need seems to be -a/--suffix-length=N (generate suffixes of length N (default 2)), not -n/--number=CHUNKS (generate CHUNKS output files)
split -d -l 50000 -a 4 domains.xml domains_
and you should get: domains_0000, domains_0001...
I would use awk
. It gives you finer control over your output files and filenames. It should be just ask quick too. Here's how to split a 100 line file into 20 line blocks:
awk 'NR%20==1 { file = FILENAME "_" sprintf("%04d", NR+19) } { print > file }' domains.xml
This should create some files like:
file_0020
file_0040
file_0060
file_0080
file_0100
Adjust accordingly. HTH.
Although you haven't asked for it but I suppose you'd want a proper extension to the resultant files (let us say xml) :
split -d -l 50000 -a 4 --additional-suffix=.xml domains.xml domains_
The --additional-suffix=.xml
will make file names of type domains_0000.xml
, domains_1453.xml
etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With