Finding line beginning using regular expression

Question

Finding Line Beginning using Regular expression in Notepad++

I want to strip a 4000-line HTML file from all the jQuery "done" attributes in a div.

<DIV class=menu done27="1" done26="0"
done9="1" done8="0" done7="1"
done6="0" done4="20">

should be replaced with:

<DIV class=menu>

In this experiment I can do it with this regular expression:

[ ^]done[0-9]+="[0-9]+"

Using Notepad++ 5.6.8 Unicode, with a file encoded in ANSI, I'm putting this regex in the "Find what" field. It only replaces the 5 occurrences starting with a space, it will miss the 2 occurrences starting at the beginning of a line.

How can I construct a regex to remove all the attributes of an HTML element starting with a keyword?

Michel Merlin · Accepted Answer

Extended Replace " " with "LINEBREAK "

Thanks a lot to all for these timely replies. Following your advices, here's what I did:

"Notepad++ > View > Show Symbol > Show End Of Line" shows "CR+LF" at each line end.
"Notepad++ > Search > Find", "Search mode" = "Normal", made sure that "Find what" = "LINEBREAK" finds nothing
"Search mode" = "Extended", "Find what" = " " only finds the double-breaks (CR + LF + a blank line); " " find nothing; yet " " does find exactly all line breaks, and only them.
Saving my "Towncar.htm" test file as "Towncar_02.htm" (also encoded in ANSI)
Under "Extended", replaced all " " with "LINEBREAK " (notice the trailing space)
Under "Regular expression", replaced each occurrence of:
```
 done[0-9]*="[0-9]*"
```

(Be careful to check there is THE HEADING SPACE before "done"
and there is NO TRAILING SPACE! see below)

with an empty string

Under "Extended", replaced each occurrence of "LINEBREAK" with " " (no trailing space this time after "LINEBREAK"!)
Checked that the resulting "Towncar.htm" file (after a few cosmetic reformatting) looked OK and pretty, and that after refresh, it still rendered the same as the "Towncar_02.htm" backup.

Recalls and Notes:

This forum apparently works well in Chrome 4; but with some browsers (e.g. IE6 and other discontinued ones), under some circumstances, it causes some artifacts; so, be careful:
even if the forum doesn't show it in your browser, there is a heading space, i.e. at the beginning of the Regex (the " done..." Regular expression above) and inside it, so to replace only strings starting with " done", with the starting space, thus making even surer to NOT alter eventual other strings with "undone" or "methadone" or else
same way, even if the forum shows one in your browser, there is no trailing space at the end of the Regex!
in the Regex, [0-9] matches 1 and only 1 occurrence of any decimal digit (characters in the 0-9 range); IOW it matches « 0 » or « 1 » or « 9 » etc, but NOT « 01 » or « 835 » or « » (the empty string) or whichever.
* (asterisk) matches 0 or more times the previous character (here it matches the empty string or any string made exclusively of digits)
samewise, + (plus sign) matches 1 or more times the previous character (here it matches any string, at least 1 character long, made exclusively of digits)
Ref: http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Regular_Expressions#Notepad.2B.2B_regex_syntax

Finding line beginning using regular expression

Tags:

regex

notepad++