Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to get n middle lines from a very big file

Tags:

unix

head

tail

I have a big file around 60GB.

I need to get n middle lines of the file. I am using a command with head and tail like

tail -m file |head -n >output.txt
where m,n are numbers

The general structure of the file is like below with set of records (comma separated columns.) Each line can be of different length(say max 5000 chars).

col1,col2,col3,col4...col10

Is there any other way that I can take n middle lines with less time, because the current command is taking lot of time to execute?

like image 440
Mahesh Avatar asked Dec 09 '13 07:12

Mahesh


3 Answers

With sed you can at least remove the pipeline:

sed -n '600000,700000p' file > output.txt

will print lines 600000 through 700000.

like image 109
perreal Avatar answered Nov 12 '22 08:11

perreal


awk 'FNR>=n && FNR<=m'

followed by name of the file.

like image 38
Anitha Mani Avatar answered Nov 12 '22 09:11

Anitha Mani


It might be more efficient to use the split utility, because with tail and head in pipe you scan some parts of the file twice.

Example

split -l <k> <file> <prefix>

Where k is the number of lines you want to have in each file, and the (optional) prefix is added to each output file name.

like image 45
Rajish Avatar answered Nov 12 '22 07:11

Rajish