Faster Alternative to Unix Grep

Question

I'm trying to do the following

$ grep ">" file.fasta > output.txt

But it is taking so long when the input fasta file is large.

The input file looks like this:

>seq1
ATCGGTTA
>seq2
ATGGGGGG

Is there a faster alternative?

Debaditya · Accepted Answer

Use time command with all these

$> time grep ">" file.fasta > output.txt

$> time egrep ">" file.fasta > output.txt

$> time awk  '/^>/{print $0}' file.fasta > output.txt -- If ">' is first letter

If you see the output..they are almost the same .

In my opinion ,if the data is in columnar format, then use awk to search.

wildplasser · Answer

Hand-built state machine. If you only want '>' to be accepted at the beginning of the line, you'll need one more state. If you need to recognise ' ' too, you will need a few more states.

#include <stdio.h>

int main(void)
{
int state,ch;

for(state=0; (ch=getc(stdin)) != EOF;   ) {
        switch(state) {
        case 0: /* start */
                if (ch == '>') state = 1;
                else break;
        case 1: /* echo */
                fputc(ch,stdout);
                if (ch == '
') state = 0;
                break;
                }
        }
if (state==1) fputc('
',stdout);
return 0;
}

If you want real speed, you could replace the fgetc() and fputc() by their macro equivalents getc() and putc(). (but I think trivial programs like this will be I/O bound anyway)

Faster Alternative to Unix Grep

Tags:

grep

unix

neversaint

2 Answers

Debaditya

wildplasser

Recent Activity

Donate For Us

Faster Alternative to Unix Grep

Tags:

grep

unix

neversaint

2 Answers

Debaditya

wildplasser

Related questions

Recent Activity

Donate For Us