Match any character (including newlines) in sed

Tags:

I have a sed command that I want to run on a huge, terrible, ugly HTML file that was created from a Microsoft Word document. All it should do is remove any instance of the string

style='text-align:center; color:blue;
exampleStyle:exampleValue'

The sed command that I am trying to modify is

sed "s/ style='[^']*'//" fileA > fileB

It works great, except that whenever there is a new line inside of the matching text, it doesn't match. Is there a modifier for sed, or something I can do to force matching of any character, including newlines?

I understand that regexps are terrible at XML and HTML, blah blah blah, but in this case, the string patterns are well-formed in that the style attributes always start with a single quote and end with a single quote. So if I could just solve the newline problem, I could cut down the size of the HTML by over 50% with just that one command.

In the end, it turned out that Sinan Ünür's perl script worked best. It was almost instantaneous, and it reduced the file size from 2.3 MB to 850k. Good ol' Perl...

775

asked Jul 22 '09 12:07

Cory McHugh

1 Answers

sed goes over the input file line by line which means, as I understand, what you want is not possible in sed.

You could use the following Perl script (untested), though:

#!/usr/bin/perl

use strict;
use warnings;

{
    local $/; # slurp mode
    my $html = <>;
    $html =~ s/ style='[^']*'//g;
    print $html;
}

__END__

A one liner would be:

$ perl -e 'local $/; $_ = <>; s/ style=\047[^\047]*\047//g; print' fileA > fileB

114

answered Nov 15 '22 18:11

Sinan Ünür

Related questions
                            
                                How to create a color transition controlled by window scroll
                            
                                Using Chrome to debug React TypeScript .tsx file - Webpack
                            
                                How to find css unit for this number
                            
                                How to avoid "The number of GET/POST parameters exceeded" error?
                            
                                select2 - how to allow a null value
                            
                                Change top or bottom position of bootstrap popover on content position
                            
                                How to fetch local html file with vue.js?
                            
                                How to get both EJS compilation and html-loader in html-webpack-plugin?
                            
                                Hiding disabled select options in internet explorer > 11 using angularjs
                            
                                transform scale works incorrectly for odd pixel widths
                            
                                CSS-Grid: How to center content without shrinking the item itself? [duplicate]
                            
                                How to filter data-tables using two or more dependent drop-down list?
                            
                                CSS text-overflow ellipsis not working in Grid / Flex
                            
                                Electron "require is not defined"
                            
                                Style bootstrap-select "placeholder" differently
                            
                                How to Get the Contents of a Custom Element
                            
                                Does CSS new 'content-visibility' property interfere with scripts loading behavior?
                            
                                Making an iframe take vertical space
                            
                                IE's Default CSS Values [closed]
                            
                                Trim string to length ignoring HTML

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Match any character (including newlines) in sed

Tags:

html

replace

newline

sed

coding-style

Cory McHugh

People also ask

1 Answers

Sinan Ünür

Recent Activity

Donate For Us