Extract part of a regex match

People also ask

How do you extract a substring from a string in Python RegEx?

Use re.search() to extract a substring matching a regular expression pattern. Specify the regular expression pattern as the first parameter and the target string as the second parameter. \d matches a digit character, and + matches one or more repetitions of the preceding pattern.

How do I capture a word in RegEx?

To run a “whole words only” search using a regular expression, simply place the word between two word boundaries, as we did with ‹ \bcat\b ›. The first ‹ \b › requires the ‹ c › to occur at the very start of the string, or after a nonword character.

What does \b mean in RegEx?

The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”. This match is zero-length. There are three different positions that qualify as word boundaries: Before the first character in the string, if the first character is a word character.

Use ( ) in regexp and group(1) in python to retrieve the captured string (re.search will return None if it doesn't find the result, so don't use group() directly):

title_search = re.search('<title>(.*)</title>', html, re.IGNORECASE)

if title_search:
    title = title_search.group(1)

Note that starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), it's possible to improve a bit on Krzysztof Krasoń's solution by capturing the match result directly within the if condition as a variable and re-use it in the condition's body:

# pattern = '<title>(.*)</title>'
# text = '<title>hello</title>'
if match := re.search(pattern, text, re.IGNORECASE):
  title = match.group(1)
# hello

Try using capturing groups:

title = re.search('<title>(.*)</title>', html, re.IGNORECASE).group(1)

May I recommend you to Beautiful Soup. Soup is a very good lib to parse all of your html document.

soup = BeatifulSoup(html_doc)
titleName = soup.title.name

Related questions
                            
                                Get last n lines of a file, similar to tail
                            
                                How to initialize weights in PyTorch?
                            
                                How to write to a file, using the logging Python module?
                            
                                Label axes on Seaborn Barplot
                            
                                How do you divide each element in a list by an int?
                            
                                Numpy first occurrence of value greater than existing value
                            
                                tqdm in Jupyter Notebook prints new progress bars repeatedly
                            
                                Open S3 object as a string with Boto3
                            
                                Split list into smaller lists (split in half)
                            
                                Get protocol + host name from URL
                            
                                How do I get the different parts of a Flask request's url?
                            
                                How to update SQLAlchemy row entry?
                            
                                In requirements.txt, what does tilde equals (~=) mean?
                            
                                How is set() implemented?
                            
                                Changes in import statement python3
                            
                                How to reliably open a file in the same directory as the currently running script
                            
                                How to list imported modules?
                            
                                What's the difference between a Python "property" and "attribute"?
                            
                                Django model "doesn't declare an explicit app_label"
                            
                                How to check if an object is a generator object in python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Extract part of a regex match

Tags:

python

html

regex

html-content-extraction

People also ask

Recent Activity

Donate For Us