I have a large file which contains data for 10 years. I want to split it into files that contain 1 year of data each.
The data in the file is in the following format:
GBPUSD,20100201,000200,1.5969,1.5969,1.5967,1.5967,4 GBPUSD,20100201,000300,1.5967,1.5967,1.5960,1.5962,4
Characters 8-11 contain the year. I would like to use that as the filename with .txt on the end. So 2011.txt, 2012.txt etc
The file contains around 4million rows.
I'm using Ubuntu Linux
To split large files into small pieces, we use the split command in the Linux operating system. The split command is used to split or break large files into small pieces in the Linux system. By default, it generates output files of a fixed size, the default lines are 1000 and the default prefix would be 'x'.
If you use the -l (a lowercase L) option, replace linenumber with the number of lines you'd like in each of the smaller files (the default is 1,000). If you use the -b option, replace bytes with the number of bytes you'd like in each of the smaller files.
Here's one way using awk
:
awk '{ print > substr($0,8,4) ".txt" }' file
If the length of the first field can vary, you may prefer:
awk -F, '{ print > substr($2,0,4) ".txt" }' file
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With