Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding line beginning using regular expression

Finding Line Beginning using Regular expression in Notepad++

I want to strip a 4000-line HTML file from all the jQuery "done" attributes in a div.

<DIV class=menu done27="1" done26="0"
done9="1" done8="0" done7="1"
done6="0" done4="20">

should be replaced with:

<DIV class=menu>

In this experiment I can do it with this regular expression:

[ ^]done[0-9]+="[0-9]+"

Using Notepad++ 5.6.8 Unicode, with a file encoded in ANSI, I'm putting this regex in the "Find what" field. It only replaces the 5 occurrences starting with a space, it will miss the 2 occurrences starting at the beginning of a line.

How can I construct a regex to remove all the attributes of an HTML element starting with a keyword?

like image 493
Michel Merlin Avatar asked Apr 21 '10 08:04

Michel Merlin


1 Answers

Extended Replace "\n" with "LINEBREAK "

Thanks a lot to all for these timely replies. Following your advices, here's what I did:

  • "Notepad++ > View > Show Symbol > Show End Of Line" shows "CR+LF" at each line end.
  • "Notepad++ > Search > Find", "Search mode" = "Normal", made sure that "Find what" = "LINEBREAK" finds nothing
  • "Search mode" = "Extended", "Find what" = "\n\r" only finds the double-breaks (CR + LF + a blank line); "\n \r" find nothing; yet "\n" does find exactly all line breaks, and only them.
  • Saving my "Towncar.htm" test file as "Towncar_02.htm" (also encoded in ANSI)
  • Under "Extended", replaced all "\n" with "LINEBREAK " (notice the trailing space)
  • Under "Regular expression", replaced each occurrence of:

     done[0-9]*="[0-9]*"
    

(Be careful to check there is THE HEADING SPACE before "done"
and there is NO TRAILING SPACE! see below)

with an empty string

  • Under "Extended", replaced each occurrence of "LINEBREAK" with "\n" (no trailing space this time after "LINEBREAK"!)
  • Checked that the resulting "Towncar.htm" file (after a few cosmetic reformatting) looked OK and pretty, and that after refresh, it still rendered the same as the "Towncar_02.htm" backup.

Recalls and Notes:

  • This forum apparently works well in Chrome 4; but with some browsers (e.g. IE6 and other discontinued ones), under some circumstances, it causes some artifacts; so, be careful:
  • even if the forum doesn't show it in your browser, there is a heading space, i.e. at the beginning of the Regex (the " done..." Regular expression above) and inside it, so to replace only strings starting with " done", with the starting space, thus making even surer to NOT alter eventual other strings with "undone" or "methadone" or else
  • same way, even if the forum shows one in your browser, there is no trailing space at the end of the Regex!
  • in the Regex, [0-9] matches 1 and only 1 occurrence of any decimal digit (characters in the 0-9 range); IOW it matches « 0 » or « 1 » or « 9 » etc, but NOT « 01 » or « 835 » or « » (the empty string) or whichever.
  • * (asterisk) matches 0 or more times the previous character (here it matches the empty string or any string made exclusively of digits)
  • samewise, + (plus sign) matches 1 or more times the previous character (here it matches any string, at least 1 character long, made exclusively of digits)
    Ref: http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Regular_Expressions#Notepad.2B.2B_regex_syntax
like image 147
Michel Merlin Avatar answered Oct 11 '22 14:10

Michel Merlin