Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex matching specific value after a certain number of tabs

Tags:

regex

perl

In a tab delimited text file, I would like to match only lines containing the "1" value right after the 24th tab.
Right now, the regex I have seems to match what I want, but breaks when the line doesn't match.
Could you help me improving it?

My regex :

/(?:.+?\t){24}1/  

Sample input :

INT E_63    0   0   u   Le  Le  DET:ART DET le  ??  ADJ SENT DET:ART NOM ADV    SENT DET NOM    1   ??  ??  ??  ??  ??  0   0   0   0   0   1   ??  ??  ??  ??  ??  ??  
INT E_63    0   0   u   Le  Le  DET:ART DET le  ??  ADJ SENT DET:ART NOM ADV    SENT DET NOM    1   ??  ??  ??  ??  ??  0   0   0   0   0   0   ??  ??  ??  ??  ??  ??  

(The first line should match, the second should not.)

like image 840
Azaghal Avatar asked Sep 12 '25 17:09

Azaghal


2 Answers

Your regex does not work when there is no match due to catastrophic backtracking as . also matches a tab character. Coupled with the fact that there are more subpatterns after the group with nested quantifiers, and absence of the ^ anchor, the catastrophic backtracking is imminent.

What you need is a negated character class [^\t] and anchor the pattern at the start of the string:

/^(?:[^\t]*\t){24}1/

See the regex demo.

NOTE: To match the 1 as a whole word, you might consider adding \b after it, or a lookahead (?!\S).

Details:

  • ^ - start of a string
  • (?:[^\t]*\t){24} - 24 sequences of
    • [^\t]* - 0+ chars other than a tab char
    • \t - a tab char
  • 1 - a 1 char.
like image 139
Wiktor Stribiżew Avatar answered Sep 14 '25 05:09

Wiktor Stribiżew


Instead of using regex you could just split it, check the 24th column at 23rd index and then use conditionals.

#!/usr/bin/perl
use strict;
use warnings;

open (my $fh, "<", '/path/to/tab_delem_file') or die "Could not open file $!";

while(<$fh>){
  chomp;
  my @line = split/\t/, $_; #split on tab
  if ($line[23] == 1){
      #do something
  }
  else ($line[23] == 1){
      #do something else
  }
}
like image 33
Chankey Pathak Avatar answered Sep 14 '25 06:09

Chankey Pathak