Get shortest match using regex

Question

I try to do something using regex, but I’m not sure if it’s even possible.

I work on the French Wiktionary and I try to find lines having only #* to replace them. The problem is that I need to get the nearest parameter of the template langue. So in {{langue|fr}}, I need to get fr.

Here is an example of text I have:

== {{langue|fr}} ==
=== {{S|étymologie}} ===
: Emprunté au {{étyl|ja|fr|mot=津波|tr=tsunami}} du même sens, littéralement « [[vague]] [[portuaire]] ».

=== {{S|nom|fr}} ===
{{fr-rég|tsu.na.mi|pron2=tsy.na.mi}}
'''tsunami''' {{pron|tsu.na.mi|fr}} ''ou'' {{pron|tsy.na.mi|fr}} {{m}}
# Énorme [[vague]] causée par un [[évènement]] [[géologique]] comme un [[séisme]] ou une [[éruption]] volcanique ou [[astronomique]] comme un [[météorite]].
#* ''Le '''tsunami''' de décembre 2004 a balayé l’Asie du Sud-Est.''

== {{langue|en}} ==
=== {{S|étymologie}} ===
: Du {{étyl|ja|en|mot=津波|tr=tsunami}}.

=== {{S|nom|en}} ===
{{en-nom|tsunami|tsunami|p2=tsunamis|tsu.ˈnɑ.mi|tsu.ˈnɑ.mi|pp2=tsu.ˈnɑ.miz}}
'''tsunami'''
# [[#fr|Tsunami]].
#* {{ébauche-exe|en}}

== {{langue|es}} ==
=== {{S|étymologie}} ===
: Du {{étyl|ja|es|mot=津波|tr=tsunami|sens=}}.

=== {{S|nom|es}} ===
{{es-rég|}}
'''tsunami''' {{pron||es}} {{m}}
# [[#fr|Tsunami]].
#*

== {{langue|sv}} ==
=== {{S|étymologie}} ===
: {{ébauche-étym|sv}}

=== {{S|nom|sv}} ===
{{sv-nom-c-er|2=tsunamin}}
'''tsunami''' {{pron||sv}} {{c}}
# [[tsunami#fr|Tsunami]].
#* {{ébauche-exe|sv}}

I tried by using this regex {{langue\|([^}]+)}}((?:.| )+)(#+\*) ?'*. The thing is that my regex mathes nearly the entire text. And this is not what I want. In my example, the incorrect line is in the es section. And so, the parameter I need to fetch is es. In my regex, there are three capturing groups: the first for the lang code, the second for all the text between the two other groups and the last to get the beginning of the line, as the number of # can change. If this is possible, I will replace the matched string by {{langue|$1}}$2$3 {{ébauche-exe|$1}}.

Is this possible using this kind of regex? If so, how? If it’s not possible, is there a way to do this by regex?

Wiktor Stribiżew · Accepted Answer

You can use

(?m)^== {{langue\|([^{}]+)}}(.*(?:
(?!== {{langue\|[^{}]+}}).*)*)(#+\*) ?'*$

See the regex demo.

Details:

(?m)^ - start of a line
== - a lieral string
{{langue\| - {{langue| string
([^{}]+) - Group 1: one or more chars other than { and }
}} - a }} string
(.*(?: (?!== {{langue\|[^{}]+}}).*)*) - Group 2: the rest of a line and then 0 or more lines not starting with the == {{langue| and then one or more chars other than { and } and then }}
(#+\*) - Group 3: one or more # and then a * char
? - an optional space
'* - zero or more ' chars
$ - end of a line.

Get shortest match using regex

Tags:

regex

Lepticed

1 Answers

Wiktor Stribiżew

Recent Activity

Donate For Us

Get shortest match using regex

Tags:

regex

Lepticed

1 Answers

Wiktor Stribiżew

Related questions

Recent Activity

Donate For Us