Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Notepad++ regex group capture

I have such txt file:

ххх.prontube.ru salo.ru bbb.antichat.ru yyy.ru xx.bb.prontube.ru zzz.com srfsf.jwbefw.com.ua 

Trying to delete all subdomains with such regex:

Find:    .+\.((.*?)\.(ru|ua|com\.ua|com|net|info))$ Replace with: \1 

Receive:

prontube.ru salo.ru antichat.ru yyy.ru prontube.ru zzz.com com.ua 

Why last line becomes com.ua instead of jwbefw.com.ua ?

like image 581
pnslg Avatar asked Jul 01 '13 22:07

pnslg


People also ask

How do I create a capture group in regex?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d" "o" and "g" .

Can I use regex in Notepad?

Using Regex to find and replace text in Notepad++ In all examples, use select Find and Replace (Ctrl + H) to replace all the matches with the desired string or (no string). And also ensure the 'Regular expression' radio button is set.

What is first capturing group in regex?

First group matches abc. Escaped parentheses group the regex between them. They capture the text matched by the regex inside them into a numbered group that can be reused with a numbered backreference. They allow you to apply regex operators to the entire grouped regex.


1 Answers

This works without look around:

Find: [a-zA-Z0-9-.]+\.([a-zA-Z0-9-]+)\.([a-zA-Z0-9-]+)$ Replace: \1\.\2

It finds something with at least 2 periods and only letters, numbers, and dashes following the last two periods; then it replaces it with the last 2 parts. More intuitive, in my opinion.

There's something funny going on with that leading xxx. It doesn't appear to be plain ASCII. For the sake of this question, I'm going to assume that's just something funny with this site and not representative of your real data.

Incorrect

Interestingly, I previously had an incorrect answer here that accumulated a lot of upvotes. So I think I should preserve it:

Find: [a-zA-Z0-9-]+\.([a-zA-Z0-9-]+)\.(.+)$ Replace: \1\.\2

It just finds a host name with at least 2 periods in it, then replaces it with everything after the first dot.

like image 103
jpmc26 Avatar answered Sep 23 '22 20:09

jpmc26