Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast way to find string in file in unix

Tags:

unix

I want to find string pattern in file in unix. I use below command:

$grep 2005057488 filename

But file contains millions of lines and i have many such files. What is fastest way to get pattern other than grep.

like image 580
sandeep7289 Avatar asked Nov 29 '12 09:11

sandeep7289


2 Answers

grep is generally as fast as it gets. It's designed to one thing and one thing only - and it does what it does very well. You can read why here.

However, to speed things up there are a couple of things you could try. Firstly, it looks like the pattern you're looking for is a fixed string. Fortunately, grep has a 'fixed-strings' option:

-F, --fixed-strings
       Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched. (-F is specified by POSIX.)

Secondly, because grep is generally pretty slow on UTF-8, you could try disabling national language support (NLS) by setting the environment LANG=C. Therefore, you could try this concoction:

LANG=C grep -F "2005057488" file

Thirdly, it wasn't clear in your question, but if your only trying to find if something exists once in your file, you could also try adding a maximum number of times to find the pattern. Therefore, when -m 1, grep will quit immediately after the first occurrence is found. Your command could now look like this:

LANG=C grep -m 1 -F "2005057488" file

Finally, if you have a multicore CPU, you could give GNU parallel a go. It even comes with an explanation of how to use it with grep. To run 1.5 jobs per core and give 1000 arguments to grep:

find . -type f | parallel -k -j150% -n 1000 -m grep -H -n STRING {}

To grep a big file in parallel use --pipe:

< bigfile parallel --pipe grep STRING

Depending on your disks and CPUs it may be faster to read larger blocks:

< bigfile parallel --pipe --block 10M grep STRING
like image 145
Steve Avatar answered Oct 24 '22 23:10

Steve


grep works faster than sed.

$grep 2005057488 filename
$sed -n '/2005057488/p' filename

Still Both works to get that particular string in a file

like image 20
Sreekumar Avatar answered Oct 24 '22 23:10

Sreekumar