Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matching across multiple lines regular expression

Tags:

regex

I have several lists in a single text file that look like below. It always starts with 0 and it always ends with the word Unique at the start of a newline. I would like to get rid of all of it apart from the line with Unique on it. I looked through stackoverflow and tried the following but it returns the whole text file (there are other strings in the file that I haven't put in this example). Basically the problem is how to account for the newlines in the regex selection

^0(.|\n)*

Input:

0       145
1       139
2       175
3       171
4       259
5       262
6       293
7       401
8       430
9       417
10      614
11      833
12      1423
13      3062
14      10510
15      57587
16      5057575
17      10071
18      375
19      152
20      70
21      55
22      46
23      31
24      25
25      22
26      25
27      14
28      16
29      16
30      8
31      10
32      8
33      21
34      8
35      51
36      65
37      605
38      32
39      2
40      1
41      2
44      1
48      2
51      1
52      1
57      1
63      2
68      1
82      1
94      1
95      1
101     3
102     7
103     1
110     1
111     1
119     1
123     1
129     2
130     3
131     2
132     1
135     1
136     2
137     7
138     4
Unique: 252851

Expected output:

Unique: 252851
like image 895
Sebastian Zeki Avatar asked Mar 04 '26 17:03

Sebastian Zeki


2 Answers

You need to use something like

^0[\s\S]*?[\n\r]Unique:

and replace with Unique:.

  • ^ - start of a line
  • 0 - a literal 0
  • [\s\S]*? - zero or more characters incl. a newline as few as possible
  • [\n\r] - a linebreak symbol
  • Unique: - a whole word Unique:

Another possible regex is:

^0[^\r]*(?:\r(?!Unique:)[^\r]*)*

where \r is the line endings in the current file. Replace with an empty string.

Note that you could also use (?m)^0.*?[\r\n]Unique: regex (to replace with Unique:) with the (?m) option:

m: multi-line (dot(.) match newline)

like image 145
Wiktor Stribiżew Avatar answered Mar 06 '26 19:03

Wiktor Stribiżew


Your method of matching newlines should work, although it's not optimal (alternation is rather slow); the next problem is to make sure the match stops before Unique:

(?s)^0.*(?=Unique:)

should work if there is only one Unique: in your file.

Explanation:

(?s)         # Start "dot matches all (including newlines) mode
^0           # Match "0" at the start of the file
.*           # Match as many characters as possible
(?=Unique:)  # but then backtrack until you're right before "Unique:"
like image 26
Tim Pietzcker Avatar answered Mar 06 '26 19:03

Tim Pietzcker



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!