Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Quickest way to split a large file based on text within the file in linux

Tags:

linux

bash

sed

awk

I have a large file which contains data for 10 years. I want to split it into files that contain 1 year of data each.

The data in the file is in the following format:

GBPUSD,20100201,000200,1.5969,1.5969,1.5967,1.5967,4 GBPUSD,20100201,000300,1.5967,1.5967,1.5960,1.5962,4

Characters 8-11 contain the year. I would like to use that as the filename with .txt on the end. So 2011.txt, 2012.txt etc

The file contains around 4million rows.

I'm using Ubuntu Linux

like image 672
zio Avatar asked Feb 03 '13 21:02

zio


People also ask

How do I break a large file into smaller parts in Linux?

To split large files into small pieces, we use the split command in the Linux operating system. The split command is used to split or break large files into small pieces in the Linux system. By default, it generates output files of a fixed size, the default lines are 1000 and the default prefix would be 'x'.

How do I split a large text file in Unix?

If you use the -l (a lowercase L) option, replace linenumber with the number of lines you'd like in each of the smaller files (the default is 1,000). If you use the -b option, replace bytes with the number of bytes you'd like in each of the smaller files.


1 Answers

Here's one way using awk:

awk '{ print > substr($0,8,4) ".txt" }' file

If the length of the first field can vary, you may prefer:

awk -F, '{ print > substr($2,0,4) ".txt" }' file
like image 97
Steve Avatar answered Oct 14 '22 04:10

Steve