I need to write a Perl script to read in a file, and delete anything inside < >, even if they're on different lines. That is, if the input is: <pre class="prettyprint"><code>Hello, world. I <enjoy eating bagels. They are quite tasty. I prefer when I ate a bagel to when I >ate a sandwich. bananas. </code></pre> I want the output to be: <pre class="prettyprint"><code>Hello, world. I ate a sandwich. bananas. </code></pre> I know how to do this if the text is on 1 line with a regex. But I don't know how to do it with multiple lines. Ultimately I need to be able to conditionally delete parts of a template so I can generate parametrized files for config files. I thought perl would be a good language but I am still getting the hang of it. Edit: Also need more than 1 instance of <>

In Perl: <pre class="prettyprint"><code>#! /usr/bin/perl use strict; my $text = <>; $text =~ s/<[^>]*>//g; print $text; </code></pre> The regex substitutes anything starting with a < through the first > (inclusive) and replaces it with nothing. The g is global (more than once). EDIT: incorporated comments from Hynek and chaos

How can I delete characters between < and > in Perl?

Tags:

regex

multiline

perl

I need to write a Perl script to read in a file, and delete anything inside < >, even if they're on different lines. That is, if the input is:

Hello, world. I <enjoy eating
bagels. They are quite tasty.
I prefer when I ate a bagel to
when I >ate a sandwich. <I also
like >bananas.

I want the output to be:

Hello, world. I ate a sandwich. bananas.

I know how to do this if the text is on 1 line with a regex. But I don't know how to do it with multiple lines. Ultimately I need to be able to conditionally delete parts of a template so I can generate parametrized files for config files. I thought perl would be a good language but I am still getting the hang of it.

Edit: Also need more than 1 instance of <>

230

asked Apr 10 '09 14:04

rlbond

2 Answers

You may want to check out a Perl module Text::Balanced, part of the core distribution. I think it'll be of help for you. Generally, one wants to avoid regexes to do that sort of thing IF the subject text is likely to have an inner set of delimiters, it can get very messy.

162

answered Nov 08 '22 13:11

Danny

In Perl:

#! /usr/bin/perl   
use strict;

my $text = <>;
$text =~ s/<[^>]*>//g;
print $text;

The regex substitutes anything starting with a < through the first > (inclusive) and replaces it with nothing. The g is global (more than once).

EDIT: incorporated comments from Hynek and chaos

answered Nov 08 '22 13:11

Gene Gotimer

Related questions
                            
                                Match an email address if it contains a dot
                            
                                Remove white space from JSON object, but not within quotes
                            
                                Find and remove a string starting and ending with a specific substring in python
                            
                                How to delete everything after nth delimiter in R?
                            
                                Perl regex alternation
                            
                                How can I camelcase a string in php
                            
                                Python String Split on pattern without removing delimiter
                            
                                How to validate 12-Hour Time with regex (regular Expression)
                            
                                Regexp is not defined in Jest test
                            
                                How to extract the "domain" from an email address
                            
                                Javascript remove all characters from string which are not numbers, letters and whitespace
                            
                                Validate URL with AngularJS and HTML 5
                            
                                Javascript replace regex all html tags except p,a and img
                            
                                awk concatenate strings till contain substring
                            
                                how to filter pandas dataframe by string?
                            
                                Remove all rows that meet regex condition
                            
                                JavaScript remove spaces, country code and begging zero from contact number
                            
                                How do I take only the first occurrence of a hyphen in sed?
                            
                                Why is a matched substring returning "undefined" in JavaScript?
                            
                                Format string as UK phone number

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With