I need some help on declaring a regex. My inputs are like the following:
this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. and there are many other lines in the txt files with<[3> such tags </[3>
The required output is:
this is a paragraph with in between and then there are cases ... where the number ranges from 1-100. and there are many other lines in the txt files with such tags
I've tried this:
#!/usr/bin/python import os, sys, re, glob for infile in glob.glob(os.path.join(os.getcwd(), '*.txt')): for line in reader: line2 = line.replace('<[1> ', '') line = line2.replace('</[1> ', '') line2 = line.replace('<[1>', '') line = line2.replace('</[1>', '') print line
I've also tried this (but it seems like I'm using the wrong regex syntax):
line2 = line.replace('<[*> ', '') line = line2.replace('</[*> ', '') line2 = line.replace('<[*>', '') line = line2.replace('</[*>', '')
I dont want to hard-code the replace
from 1 to 99.
sub() method will replace all pattern occurrences in the target string. By setting the count=1 inside a re. sub() we can replace only the first occurrence of a pattern in the target string with another string.
Regex can be used to perform various tasks in Python. It is used to do a search and replace operations, replace patterns in text, check if a string contains the specific pattern.
The REGEXREPLACE( ) function uses a regular expression to find matching patterns in data, and replaces any matching values with a new string.
subn() If you want to replace a string that matches a regular expression (regex) instead of perfect match, use the sub() of the re module. In re. sub() , specify a regex pattern in the first argument, a new string in the second, and a string to be processed in the third.
This tested snippet should do it:
import re line = re.sub(r"</?\[\d+>", "", line)
Edit: Here's a commented version explaining how it works:
line = re.sub(r""" (?x) # Use free-spacing mode. < # Match a literal '<' /? # Optionally match a '/' \[ # Match a literal '[' \d+ # Match one or more digits > # Match a literal '>' """, "", line)
Regexes are fun! But I would strongly recommend spending an hour or two studying the basics. For starters, you need to learn which characters are special: "metacharacters" which need to be escaped (i.e. with a backslash placed in front - and the rules are different inside and outside character classes.) There is an excellent online tutorial at: www.regular-expressions.info. The time you spend there will pay for itself many times over. Happy regexing!
str.replace()
does fixed replacements. Use re.sub()
instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With