Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use regex to parse a number from HTML?

Tags:

python

regex

I want to write a simple regular expression in Python that extracts a number from HTML. The HTML sample is as follows:

Your number is <b>123</b> 

Now, how can I extract "123", i.e. the contents of the first bold text after the string "Your number is"?

like image 204
Saqib Avatar asked Jun 23 '12 16:06

Saqib


People also ask

How do you parse HTML with regex?

One simple way to parse HTML is to use regular expressions to repeatedly search for and extract substrings that match a particular pattern. We can construct a well-formed regular expression to match and extract the link values from the above text as follows: href="http[s]?://.

Can regex be used for numbers?

The regex [0-9] matches single-digit numbers 0 to 9. [1-9][0-9] matches double-digit numbers 10 to 99. That's the easy part. Matching the three-digit numbers is a little more complicated, since we need to exclude numbers 256 through 999.

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.

How do you write numbers in regex?

Definition and Usage. The [0-9] expression is used to find any character between the brackets. The digits inside the brackets can be any numbers or span of numbers from 0 to 9. Tip: Use the [^0-9] expression to find any character that is NOT a digit.


1 Answers

import re m = re.search("Your number is <b>(\d+)</b>",       "xxx Your number is <b>123</b>  fdjsk") if m:     print m.groups()[0] 
like image 124
Yevgen Yampolskiy Avatar answered Sep 30 '22 13:09

Yevgen Yampolskiy