Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Match line breaks with a regular expression

Tags:

regex

The text:

<li><a href="#">Animal and Plant Health Inspection Service Permits     Provides information on the various permits that the Animal and Plant Health Inspection Service issues as well as online access for acquiring those permits. 

I want to use a regular expression to insert </a> at the end of Permits. It just so happens that all of my similar blocks of HTML/text already have a line break in them. I believe I need to find a line break \n where the line contains (or starts with) <li><a href="#">.

like image 386
The Muffin Man Avatar asked Mar 04 '11 23:03

The Muffin Man


People also ask

Does match newline regex?

By default in most regex engines, . doesn't match newline characters, so the matching stops at the end of each logical line. If you want . to match really everything, including newlines, you need to enable "dot-matches-all" mode in your regex engine of choice (for example, add re. DOTALL flag in Python, or /s in PCRE.

How do you match everything including newline regex?

The dot matches all except newlines (\r\n). So use \s\S, which will match ALL characters.

What is multiline in regex?

Multiline option, or the m inline option, enables the regular expression engine to handle an input string that consists of multiple lines. It changes the interpretation of the ^ and $ language elements so that they match the beginning and end of a line, instead of the beginning and end of the input string.

What does '$' mean in regex?

Literal Characters and Sequences For instance, you might need to search for a dollar sign ("$") as part of a price list, or in a computer program as part of a variable name. Since the dollar sign is a metacharacter which means "end of line" in regex, you must escape it with a backslash to use it literally.


2 Answers

You could search for:

<li><a href="#">[^\n]+ 

And replace with:

$0</a> 

Where $0 is the whole match. The exact semantics will depend on the language are you using though.


WARNING: You should avoid parsing HTML with regex. Here's why.

like image 163
NullUserException Avatar answered Oct 03 '22 07:10

NullUserException


By default . (any character) does not match newline characters.

This means you can simply match zero or more of any character then append the end tag.

Find: <li><a href="#">.* Replace: $0</a>

like image 25
mickmackusa Avatar answered Oct 03 '22 07:10

mickmackusa