Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

is it possible to match consecutive lines that start with the same word/pattern

Tags:

regex

I realize that this is probably not the smartest thing to do with regular expressions but I was wondering if it is possible in theory.

Given a text-file example:

MYL3    P08590
MYL3    B2R534
MYL3    Q9NRS8
TM38A   Q9H6F2
TM38A   A8K9P9
TRFE    P02787
TRFE    O43890
TRFE    Q1HBA5
TRFE    Q9NQB8
TRFE    Q9UHV0
ETFA    P13804
KCRM    P06732
KCRM    Q96QL9

... would it be possible to match the lines that start with the same pattern as the previous line, just by use of regular expressions? Matching and replacing the lines (that match the criteria) with nothing would be something like:

MYL3    P08590
TM38A   Q9H6F2
TRFE    P02787
ETFA    P13804
KCRM    P06732

My guess is that even though it is possible to use multiple line matching to check the previous line, it wouldn't be possible to accomplish just by regular expressions, as there is no defined pattern to match but instead just the first (couple of) word(s) in consecutive lines. It would require to define the beginning of a line as a "variable" and to compare the beginning of the next line to that, which as far as I know not possible with regex alone.

A colleague, on the other hand, claimed that it might be possible depending on the implementation of regex. I thought I would ask to the experts here.. :)

like image 863
posdef Avatar asked Aug 23 '11 12:08

posdef


1 Answers

You can use this regex:

(?s)(\w+)\s+\w+\r\n(\1\s+\w+(?:\r\n)?)+
  1. (?s) - single line option enabled
  2. (\w+) - alphanumeric (group 1), one or more repetitions
  3. \s+ - whitespace, one or more repetitions
  4. \w+ - alphanumeric, one or more repetitions
  5. \r\n
  6. (\1\s+\w+(?:\r\n)?) - group 2, one or more repetitions: back reference to group 1, whitespace, one or more repetitions, alphanumeric, one or more repetitions, \r\n zero ore one

It will match:

enter image description here

like image 189
Kirill Polishchuk Avatar answered Nov 15 '22 08:11

Kirill Polishchuk