RegEx filter links from a document

Question

I am currently learning regex and I am trying to filter all links (eg: http://www.link.com/folder/file.html) from a document with notepad++. Actually I want to delete everything else so that in the end only the http links are listed.

So far I tried this : http\:\/\/www\.[a-zA-Z0-9\.\/\-]+

This gives me all links which is find, but how do I delete the remaining stuff so that in the end I have a neat list of all links?

If I try to replace it with nothing followed by \1, obviously the link will be deleted, but I want the exact opposite to have everything else deleted.

So it should be something like: - find a string of numbers, letters and special signs until "http" - delete what you found - and keep searching for more numbers, letters ans special signs after "html" - and delete that again

Any ideas? Thanks so much.

psxls · Accepted Answer

In Notepad++, in the Replace menu (CTRL+H) you can do the following:

Find: .*?(http\://www\.[a-zA-Z0-9\./\-]+)
Replace: $1
Options: check the Regular expression and the . matches newline

This will return you with a list of all your links. There are two issues though:

The regex you provided for matching URLs is far from being generic enough to match any URL. If it is working in your case, that's fine, else check this question.
It will leave the text after the last matched URL intact. You have to delete it manually.

RegEx filter links from a document

Tags:

regex

notepad++

Phillip

1 Answers

psxls

Recent Activity

Donate For Us

RegEx filter links from a document

Tags:

regex

notepad++

Phillip

1 Answers

psxls

Related questions

Recent Activity

Donate For Us