I have a csv import file with 33 million lines that need to be imported into my database. I can import it with a C# console app but then the stored procedures that run after the import timeout. Consequently I want to split the file into 10 smaller files.
I could do it in C# but I suspect there's a much better approach using shell utilities. I have cygwin installed and can use all the common Linux shell utilities. Is there a neat little combination of commands I could use to split the file?
To split a file into pieces, you simply use the split command. By default, the split command uses a very simple naming scheme. The file chunks will be named xaa, xab, xac, etc., and, presumably, if you break up a file that is sufficiently large, you might even get chunks named xza and xzz.
Use the location bar to navigate to the folder that contains the large file on your system. Right-click the file and select the Split operation from the program's context menu. This opens a new configuration window where you need to specify the destination for the split files and the maximum size of each volume.
If you use the -l (a lowercase L) option, replace linenumber with the number of lines you'd like in each of the smaller files (the default is 1,000). If you use the -b option, replace bytes with the number of bytes you'd like in each of the smaller files.
To split a file equally into two files, we use the '-n' option. By specifying '-n 2' the file is split equally into two files.
Use split
- e.g. to split a file every 3.4 million lines (should give you 10 files):
split -l 3400000
$ man split
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With