Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

To catch the start of a sequence from a file

Tags:

grep

bash

sed

awk

I have a text file which goes like this :

      125
      126
      127    {
      566
      567
      568
      569       # blah blah
      570    {  #blah blah
      700
      701    {

The numbers are left aligned and the pattern is always the same in the sense increasing and a curly braces at the end .I need to catch just the starting number .The braces are always found and limited to the sequence end .The start of the file is as shown starting with '125'.

In short I need :

      125
      566
      700

What I have come up with :

      grep -A1 '{' | grep -v '{' | grep -oE '(^[0-9]+?)'

but this omits '125' but I overcame by appending a newline at the head and inserting a { .

I hope to reduce this into a single regex.

Suggestions and better algorithms are welcome

like image 998
Gil Avatar asked Jul 10 '12 12:07

Gil


People also ask

How do I run a sequence file in Windows Explorer?

To run a sequence file, normally I have to start the TestStand Sequence Editor application, open the sequence file, then execute the entry point. I would like to simply double-click a sequence file in Windows Explorer to launch TestStand and execute an entry point automatically in one step.

How do I find a particular sequence in a FASTA file?

I often need to find a particular sequence in a fasta file and print it. For those who don't know, fasta is a text file format for biological sequences (DNA, proteins, etc.). It's pretty simple, you have a line with the sequence name preceded by a '>' and then all the lines following until the next '>' are the sequence itself.

How do I create a shortcut for a sequence file?

Create a shortcut for the sequence file; i.e., right-click on the sequence file and select Create Shortcut from the pop-up menu. Right-click on the newly created shortcut and select Properties from the pop-up menu. Select the Shortcut tab in the Properties window that appears.

How do I run a sequence file with the TestStand sequence editor?

If you want to run the sequence file with the TestStand Sequence Editor replace " testexec.exe " with " SeqEdit.exe ". Create a shortcut for the sequence file; i.e., right-click on the sequence file and select Create Shortcut from the pop-up menu. Right-click on the newly created shortcut and select Properties from the pop-up menu.


2 Answers

awk 'BEGIN {p=1} p==1 {print $1;p=0} $0~/{/ {p=1}'

Output:
125
566
700

Given the file format above, you could use awk and a variable/flag to keep track on when you find an opening {

like image 141
Karl Nordström Avatar answered Nov 27 '22 21:11

Karl Nordström


sed -n '1p;/{/{
N
s/.*\n\([0-9]\+\).*/\1/p
}' input_file
like image 28
perreal Avatar answered Nov 27 '22 23:11

perreal