Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression that matches all valid format IPv6 addresses

At first glance, I concede that this question looks like a duplicate of this question and any other related to it:

Regular expression that matches valid IPv6 addresses

That question in fact has an answer that nearly answers my question, but not fully.

The code from that question which I have issues with, yet had the most success with, is as shown below:

private string RemoveIPv6(string sInput)
{
    string pattern = @"(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))";
    //That is one looooong regex! From: https://stackoverflow.com/a/17871737/3472690
    //if (IsCompressedIPv6(sInput))
      //  sInput = UncompressIPv6(sInput);
    string output = Regex.Replace(sInput, pattern, "");
    if (output.Contains("Addresses"))
        output = output.Substring(0, "Addresses: ".Length);

    return output;
}

The issues I had with the regex pattern as provided in this answer, David M. Syzdek's Answer, is that it doesn't match and remove the full form of the IPv6 addresses I'm throwing at it.

I'm using the regex pattern to mainly replace IPv6 addresses in strings with blanks or null value.

For instance,

    Addresses:  2404:6800:4003:c02::8a

As well as...

    Addresses:  2404:6800:4003:804::200e

And finally...

    Addresses:  2001:4998:c:a06::2:4008

All either don't get fully matched by the regex, or failed to be completely matched.

The regex will return me the remaining parts of the string as shown below:

    Addresses:  8a

    Addresses:  200e

    Addresses:  2:4008

As can be seen, it has left remnants of the IPv6 addresses, which is hard to detect and remove, due to the varying formats that the remnants take on. Below is the regex pattern by itself for better analysis:

(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))

Therefore, my question is, how can this regex pattern be corrected so it can match, and therefore allow the complete removal of any IPv6 addresses, from a string that doesn't solely contain the IPv6 address(es) itself?

Alternatively, how can the code snippet I provided above be corrected to provide the required outcome?

For those who may be wondering, I am getting the string from the StandardOutput of nslookup commands, and the IPv6 addresses will always differ. For the examples above, I got those IPv6 addresses from "google.com" and "yahoo.com".

I am not using the built-in function to resolve DNS entries for a good reason, which I don't think will matter for the moment, therefore I am using nslookup.

As for the code that is calling that function, if required, is as below: (It itself is also another function/method, or rather part of one)

string output = "";
string garbagecan = "";
string tempRead = "";
string lastRead = "";
using (StreamReader reader = nslookup.StandardOutput)
{
     while (reader.Peek() != -1)
     {
         if (LinesRead > 3)
         {
             tempRead = reader.ReadLine();
             tempRead = RemoveIPv6(tempRead);

             if (tempRead.Contains("Addresses"))
                 output += tempRead;
             else if (lastRead.Contains("Addresses"))
                 output += tempRead.Trim() + Environment.NewLine;
             else
                 output += tempRead + Environment.NewLine;
             lastRead = tempRead;
         }
         else
             garbagecan = reader.ReadLine();
         LinesRead++;
     }
 }
 return output;

The corrected regex should only allow the removal of IPv6 addresses, and leave IPv4 addresses untouched. The string that will be passed to the regex will not contain the IPv6 address(es) alone, and will almost always contain other details, and as such, it is unpredictable at which index will the addresses appear. The regex is also skipping all other IPv6 addresses after the first occuring IPv6 addresses as well for some reason, it should be noted.

Apologies if there are any missing details, I will try my best to include them in when alerted. I would also prefer working code samples, if possible, as I have almost zero knowledge regarding regex.

like image 746
Kaitlyn Avatar asked Sep 03 '15 06:09

Kaitlyn


People also ask

How do I know if my IPv6 address is valid?

Typically strings, that do *not* trepresent a valid IPv6 address have characters other than the hex-digits in it, or they consists of less than 8 blocks of hexdigits, or they have at least one block of hexdigits in it with more than 4 hex-digits, or they have more than one position in it, where 2 colons directly follow ...

What is the format of an IPv6 address?

An IPv6 (normal) address has the format y:y:y:y:y:y:y:y, where y is called a segment and can be any hexadecimal value between 0 and FFFF. The segments are separated by colons, not periods.

What would be the regex to validate the IP address?

// this is the regex to validate an IP address. = zeroTo255 + "\\." + zeroTo255 + "\\."

What makes an IPv6 address valid?

An IPv6 normal address must have eight segments; however, a short form notation can be used in the TS4500 management GUI for segments that are zero, or those that have leading zeros. The following are examples of valid IPv6 (normal) addresses: 2001:db8:3333:4444:5555:6666:7777:8888.


1 Answers

(?:^|(?<=\s))(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))(?=\s|$)

Using lookarounds you can enforce a complete match rather than a partial match.See demo.

https://regex101.com/r/cT0hV4/5

like image 183
vks Avatar answered Oct 22 '22 17:10

vks