Matching across multiple lines regular expression

Question

I have several lists in a single text file that look like below. It always starts with 0 and it always ends with the word Unique at the start of a newline. I would like to get rid of all of it apart from the line with Unique on it. I looked through stackoverflow and tried the following but it returns the whole text file (there are other strings in the file that I haven't put in this example). Basically the problem is how to account for the newlines in the regex selection

^0(.|
)*

Input:

0       145
1       139
2       175
3       171
4       259
5       262
6       293
7       401
8       430
9       417
10      614
11      833
12      1423
13      3062
14      10510
15      57587
16      5057575
17      10071
18      375
19      152
20      70
21      55
22      46
23      31
24      25
25      22
26      25
27      14
28      16
29      16
30      8
31      10
32      8
33      21
34      8
35      51
36      65
37      605
38      32
39      2
40      1
41      2
44      1
48      2
51      1
52      1
57      1
63      2
68      1
82      1
94      1
95      1
101     3
102     7
103     1
110     1
111     1
119     1
123     1
129     2
130     3
131     2
132     1
135     1
136     2
137     7
138     4
Unique: 252851

Expected output:

Unique: 252851

Wiktor Stribiżew · Accepted Answer

You need to use something like

^0[\s\S]*?[

]Unique:

and replace with Unique:.

^ - start of a line
0 - a literal 0
[\s\S]*? - zero or more characters incl. a newline as few as possible
[ ] - a linebreak symbol
Unique: - a whole word Unique:

Another possible regex is:

^0[^
]*(?:
(?!Unique:)[^
]*)*

where is the line endings in the current file. Replace with an empty string.

Note that you could also use (?m)^0.*?[ ]Unique: regex (to replace with Unique:) with the (?m) option:

m: multi-line (dot(.) match newline)

Tim Pietzcker · Answer

Your method of matching newlines should work, although it's not optimal (alternation is rather slow); the next problem is to make sure the match stops before Unique:

(?s)^0.*(?=Unique:)

should work if there is only one Unique: in your file.

Explanation:

(?s)         # Start "dot matches all (including newlines) mode
^0           # Match "0" at the start of the file
.*           # Match as many characters as possible
(?=Unique:)  # but then backtrack until you're right before "Unique:"

Matching across multiple lines regular expression

Tags:

regex

Sebastian Zeki

2 Answers

Wiktor Stribiżew

Tim Pietzcker

Recent Activity

Donate For Us

Matching across multiple lines regular expression

Tags:

regex

Sebastian Zeki

2 Answers

Wiktor Stribiżew

Tim Pietzcker

Related questions

Recent Activity

Donate For Us