Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

bash pull certain lines from a file

Tags:

bash

file-io

I was wondering if there is a more efficient way to get this task done. I am working with files with the number of lines ranging from a couple hundred thousand to a couple million. Say I know that lines 100,000 - 125,000 are the lines that contain the data I am looking for. I would like to know if there is a quick way to pull just these desired lines from the file. Right now I am using a loop with grep like this:

 for ((i=$start_fid; i<=$end_fid; i++))
  do
    grep "^$i " fulldbdir_new >> new_dbdir${bscnt}
  done

Which works fine its just is taking longer than I would like. And the lines contain more than just numbers. Basically each line has about 10 fields with the first being a sequential integer that appears only once per file.

I am comfortable writing in C if necessary.

like image 870
mike Avatar asked Jul 25 '11 19:07

mike


People also ask

How do I grep a specific line from a file?

The grep command searches through the file, looking for matches to the pattern specified. To use it type grep , then the pattern we're searching for and finally the name of the file (or files) we're searching in. The output is the three lines in the file that contain the letters 'not'.


3 Answers

sed can do the job...

sed -n '100000,125000p' input

EDIT: As per glenn jackman's suggestion, can be adjusted thusly for efficiency...

sed -n '100000,125000p; 125001q' input

like image 157
Costa Avatar answered Sep 28 '22 15:09

Costa


You can try a combination of tail and head to get the correct lines.

head -n 125000 file_name | tail -n 25001 | grep "^$i "

Don't forget perl either.

perl -ne 'print if $. >= 100000 && $. <= 125000' file_name | grep "^$i "

or some faster perl:

perl -ne 'print if $. >= 100000; exit() if $. >= 100000 && $. <= 125000' | grep "^$i "

Also, instead of a for loop you might want to look into using GNU parallel.

like image 27
gpojd Avatar answered Sep 28 '22 17:09

gpojd


I'd use awk:

awk 'NR >= 100000; NR == 125000 {exit}' file

For big numbers you can also use E notation:

awk 'NR >= 1e5; NR == 1.25e5 {exit}' file

EDIT: @glenn jackman's suggestion (cf. comment)

like image 26
mhyfritz Avatar answered Sep 28 '22 17:09

mhyfritz