Select row without substring

Question

I'm trying to select just names from text like this (Slovak dump of Wikipedia):

    |Meno = Hans Joachim
|Plné meno = Aristoteles (???????????)
|Plné meno = Francis Bacon
|Plné meno = Sokrates ({{Cudzojazyčne|grc|????????|pc=n}})
|Meno            = Svätý František z Assisi <br /> ''(Giovanni Battista Bernardone)''
  |Meno = Friedrich Ludwig Gottlob Frege
   |Meno             = Adam František Kollár (Kolárik)
|meno    = [[J. Edgar Hoover|John Edgar Hoover]]
|meno    = [[Benedikt XIV. (1740 – 1758)|Benedikt XIV.]]
|meno    = [[Milan Rastislav Štefánik|Milan Rastislav Štefánik]]
   |Meno             = '''Ján Filc'''
  |Meno = Jean le Rond d'Alembert

Output should be like:

Hans Joachim
Aristoteles
Francis Bacon
Sokrates
Svätý František z Assisi
Friedrich Ludwig Gottlob Frege
Adam František Kollár (Kolárik)
J. Edgar Hoover|John Edgar Hoover
Benedikt XIV. (1740 – 1758)|Benedikt XIV.
Milan Rastislav Štefánik|Milan Rastislav Štefánik
Ján Filc
Jean le Rond d'Alembert

When the name is written correctly, this regular expression is working fine: = *(.*?)$ But when there are thing like "(???????????)", HTML tags and something between "{{" and "}}", I cannot select the name without the unwanted substring.

I tried a lot of options on this regex tester page (http://regex101.com/r/gS8iQ9/1), but none of them worked.

In Java code I'm using

Pattern pattern = Pattern.compile("= *(.*?)$");
Matcher matcher = pattern.matcher(line);
if (matcher.find()) {
   String foundSubstring = matcher.group(1);
   ...

Thanks for any help or suggestions on how select text after "=" but without question marks, HTML code and so on.

Bohemian · Accepted Answer

Your regex was almost right, but your input is a bit trick to work with, and you can do it in one line:

String name = line.replaceAll(".*?=[\[ ']*([\p{L}0-9|'. ()–]+[\p{L}.)]).*", "$1");

See live demo

I have tested this and it produced your desired output given your sample input.

Select row without substring

Tags:

java

regex

wikipedia

Tunerx

1 Answers

Bohemian

Recent Activity

Donate For Us

Select row without substring

Tags:

java

regex

wikipedia

Tunerx

1 Answers

Bohemian

Related questions

Recent Activity

Donate For Us