I need some help on declaring a regex. My inputs are like the following: <pre class="prettyprint lang-none prettyprint-override"><code>this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. and there are many other lines in the txt files with<[3> such tags </[3> </code></pre> The required output is: <pre class="prettyprint lang-none prettyprint-override"><code>this is a paragraph with in between and then there are cases ... where the number ranges from 1-100. and there are many other lines in the txt files with such tags </code></pre> I've tried this: <pre class="prettyprint"><code>#!/usr/bin/python import os, sys, re, glob for infile in glob.glob(os.path.join(os.getcwd(), '*.txt')): for line in reader: line2 = line.replace('<[1> ', '') line = line2.replace('</[1> ', '') line2 = line.replace('<[1>', '') line = line2.replace('</[1>', '') print line </code></pre> I've also tried this (but it seems like I'm using the wrong regex syntax): <pre class="prettyprint"><code> line2 = line.replace('<[*> ', '') line = line2.replace('</[*> ', '') line2 = line.replace('<[*>', '') line = line2.replace('</[*>', '') </code></pre> I dont want to hard-code the <code>replace</code> from 1 to 99.

This tested snippet should do it: <pre class="prettyprint lang-py prettyprint-override"><code>import re line = re.sub(r"</?\[\d+>", "", line) </code></pre> Edit: Here's a commented version explaining how it works: <pre class="prettyprint"><code>line = re.sub(r""" (?x) # Use free-spacing mode. < # Match a literal '<' /? # Optionally match a '/' \[ # Match a literal '[' \d+ # Match one or more digits > # Match a literal '>' """, "", line) </code></pre> Regexes are fun! But I would strongly recommend spending an hour or two studying the basics. For starters, you need to learn which characters are special: "metacharacters" which need to be escaped (i.e. with a backslash placed in front - and the rules are different inside and outside character classes.) There is an excellent online tutorial at: www.regular-expressions.info. The time you spend there will pay for itself many times over. Happy regexing!

<code>str.replace()</code> does fixed replacements. Use <code>re.sub()</code> instead.

How to input a regex in string.replace?

Tags:

python

string

regex

replace

I need some help on declaring a regex. My inputs are like the following:

this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>.  and there are many other lines in the txt files with<[3> such tags </[3>

The required output is:

this is a paragraph with in between and then there are cases ... where the number ranges from 1-100.  and there are many other lines in the txt files with such tags

I've tried this:

#!/usr/bin/python import os, sys, re, glob for infile in glob.glob(os.path.join(os.getcwd(), '*.txt')):     for line in reader:          line2 = line.replace('<[1> ', '')         line = line2.replace('</[1> ', '')         line2 = line.replace('<[1>', '')         line = line2.replace('</[1>', '')                  print line

I've also tried this (but it seems like I'm using the wrong regex syntax):

        line2 = line.replace('<[*> ', '')         line = line2.replace('</[*> ', '')         line2 = line.replace('<[*>', '')         line = line2.replace('</[*>', '')

I dont want to hard-code the replace from 1 to 99.

550

asked Apr 14 '11 03:04

alvas

2 Answers

This tested snippet should do it:

import re line = re.sub(r"</?\[\d+>", "", line)

Edit: Here's a commented version explaining how it works:

line = re.sub(r"""   (?x) # Use free-spacing mode.   <    # Match a literal '<'   /?   # Optionally match a '/'   \[   # Match a literal '['   \d+  # Match one or more digits   >    # Match a literal '>'   """, "", line)

Regexes are fun! But I would strongly recommend spending an hour or two studying the basics. For starters, you need to learn which characters are special: "metacharacters" which need to be escaped (i.e. with a backslash placed in front - and the rules are different inside and outside character classes.) There is an excellent online tutorial at: www.regular-expressions.info. The time you spend there will pay for itself many times over. Happy regexing!

100

answered Oct 11 '22 06:10

ridgerunner

str.replace() does fixed replacements. Use re.sub() instead.

answered Oct 11 '22 06:10

Ignacio Vazquez-Abrams

Related questions
                            
                                Numpy array dimensions
                            
                                Count the frequency that a value occurs in a dataframe column
                            
                                How to set Python's default version to 3.x on OS X? [duplicate]
                            
                                What's the best practice using a settings file in Python? [closed]
                            
                                Passing a dictionary to a function as keyword parameters
                            
                                Python date string to date object
                            
                                How can I color Python logging output?
                            
                                Get the cartesian product of a series of lists?
                            
                                How can I check the syntax of Python script without executing it?
                            
                                Reading binary file and looping over each byte
                            
                                Get list from pandas dataframe column or row?
                            
                                How can I selectively escape percent (%) in Python strings?
                            
                                How to embed image or picture in jupyter notebook, either from a local machine or from a web resource?
                            
                                How to insert newlines on argparse help text?
                            
                                What does "SyntaxError: Missing parentheses in call to 'print'" mean in Python?
                            
                                How to query as GROUP BY in django?
                            
                                Check if string matches pattern
                            
                                How to clear the interpreter console?
                            
                                What is the difference between "is None" and "== None"
                            
                                Find all packages installed with easy_install/pip?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With