I am loading HTML emails and at first I remove the HTML tags, I replace each
by a space and I reduce the double spaces by a single space - that works.
But now I have a lot of empty lines which I cannot remove. I have seen the examples which remove empty lines while reading a file, but I don't have any empty lines before I remove the HTML tags and the spaces.
I do:
$m = [IO.File]::ReadAllText("$emailFolder\$fName")
$m = $m -replace "<((?!@).)*?>" # removes all html tag but not adr: <[email protected]>
$m = $m -replace " "," "
$m = $m.Replace(' ',' ').Replace(' ',' ').Replace(' ',' ')
$m = $m.Replace('`r','').Replace('`n`n','`n').Replace('`n`n','`n') # does nothing :(
I tried various version, none of them removed the empty lines. Any idea, how I can achieve that?
Beside that I tried to use the regex multiplier to find spaces in a row and failed.
What I'm doing wrong?
$m = $m.Replace(' +',' ') # does not work
$m = $m.Replace('\s+',' ') # does not work either
If I understand you correctly, you don't want to remove all line breaks, just "empty" lines (lines that consist of nothing but whitespace).
Consider this sample string:
$multiLine = "Line 1`r`nLine 2`nLine 3`r`n`r`n `n `t `r`nLine 7`r`n"
When displayed, it will look like this on screen:
Line 1
Line 2
Line 3
Line 7
Line 4 is actually a blank line, with nothing but a CRLF. Line 5 is a space followed by a single LF, Line 6 is a space, a tab, a space, then a CRLF. I mixed line endings because HTML can be a mess; it's good to be prepared for anything!
To handle all of these, you can do a replace like this:
$multiLine -creplace '(?m)^\s*\r?\n',''
-creplace
is just the case-sensitive version of -replace
(I like to be explicit).(?m)
is an inline way to set regular expression modes. The m
mode stands for multi-line, and it lets the ^
and $
anchors match the beginning/end of each line in a string (rather than the beginning and end of the string). This is the key to your issue, I think.
^
to match the beginning of each line, then matching 0 or more whitespace using the \s
class, which includes tab.^
will catch them throughout the string.Line 1
Line 2
Line 3
Line 7
This seems to work:
$m -replace '(?ms)(?:\r|\n)^\s*$'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With