Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract an IP address from an HTML string?

I want to extract an IP address from a string (actually a one-line HTML) using Python.

>>> s = "<html><head><title>Current IP Check</title></head><body>Current IP Address: 165.91.15.131</body></html>" 

-- '165.91.15.131' is what I want!

I tried using regular expressions, but so far I can only get to the first number.

>>> import re >>> ip = re.findall( r'([0-9]+)(?:\.[0-9]+){3}', s ) >>> ip ['165'] 

But I don't have a firm grasp on reg-expression; the above code was found and modified from elsewhere on the web.

like image 383
GoJian Avatar asked May 23 '10 06:05

GoJian


People also ask

What is IP address in HTML?

An IP address consists of four numbers (each between 0 and 255) separated by periods. The format of an IP address is a 32-bit numeric address written as four decimal numbers (called octets) separated by periods; each number can be written as 0 to 255 (e.g., 0.0. 0.0 to 255.255. 255.255).

How do I track an IP address from a website?

The simplest way to determine the IP address of a website is to use our DNS Lookup Tool. Simply go to the DNS Lookup Tool, type the website URL into the text entry, and select Lookup. You'll notice the search yielded a list of IPv4 addresses that differ from the IPs shown using the other methods.


2 Answers

Remove your capturing group:

ip = re.findall( r'[0-9]+(?:\.[0-9]+){3}', s ) 

Result:

['165.91.15.131'] 

Notes:

  • If you are parsing HTML it might be a good idea to look at BeautifulSoup.
  • Your regular expression matches some invalid IP addresses such as 0.00.999.9999. This isn't necessarily a problem, but you should be aware of it and possibly handle this situation. You could change the + to {1,3} for a partial fix without making the regular expression overly complex.
like image 89
Mark Byers Avatar answered Oct 04 '22 18:10

Mark Byers


You can use the following regex to capture only valid IP addresses

re.findall(r'\b25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\b',s) 

returns

['165', '91', '15', '131'] 
like image 35
Snehal Avatar answered Oct 04 '22 19:10

Snehal