In my program, I have a list of "server address" in the following format:
host[:port]
The brackets here, indicate that the port
is optional.
host
can be a hostname, an IPv4 or IPv6 address (possibly in "bracket-enclosed" notation).port
, if present can be a numeric port number or a service string (like: "http" or "ssh").If port
is present and host
is an IPv6 address, host
must be in "bracket-enclosed" notation (Example: [::1]
)
Here are some valid examples:
localhost
localhost:11211
127.0.0.1:http
[::1]:11211
::1
[::1]
And an invalid example:
::1:80 // Invalid: Is this the IPv6 address ::1:80 and a default port, or the IPv6 address ::1 and the port 80 ?
::1:http // This is not ambigous, but for simplicity sake, let's consider this is forbidden as well.
My goal is to separate such entries in two parts (obviously host
and port
). I don't care if either the host
or port
are invalid as long as they don't contain a non-bracket-enclosed :
(290.234.34.34.5
is ok for host
, it will be rejected in the next process); I just want to separate the two parts, or if there is no port
part, to know it somehow.
I tried to do something with std::stringstream
but everything I come up to seems hacky and not really elegant.
How would you do this in C++
?
I don't mind answers in C
but C++
is prefered. Any boost
solution is welcome as well.
Thank you.
Some programs can just process an entire file at once, and other programs need to examine the file line-by-line. In the latter case, you likely need to parse data in each line. Fortunately, the C programming language has a standard C library function to do just that.
The C/C++ parser is used for C and C++ language source files. The C/C++ parser uses syntax highlighting to identify language elements, including the following elements: Identifiers.
C is a bit hard to parse because statements like `A * B();` will mean different things if A is defined as a type or note. C++ is much harder to parse because the template syntax is hard to disambiguate from less than or greater than.
To parse, in computer science, is where a string of commands – usually a program – is separated into more easily processed components, which are analyzed for correct syntax and then attached to tags that define each component. The computer can then process each program chunk and transform it into machine language.
Have you looked at boost::spirit? It might be overkill for your task, though.
Here's a simple class that uses boost::xpressive to do the job of verifying the type of IP address and then you can parse the rest to get the results.
Usage:
const std::string ip_address_str = "127.0.0.1:3282";
IpAddress ip_address = IpAddress::Parse(ip_address_str);
std::cout<<"Input String: "<<ip_address_str<<std::endl;
std::cout<<"Address Type: "<<IpAddress::TypeToString(ip_address.getType())<<std::endl;
if (ip_address.getType() != IpAddress::Unknown)
{
std::cout<<"Host Address: "<<ip_address.getHostAddress()<<std::endl;
if (ip_address.getPortNumber() != 0)
{
std::cout<<"Port Number: "<<ip_address.getPortNumber()<<std::endl;
}
}
The header file of the class, IpAddress.h
#pragma once
#ifndef __IpAddress_H__
#define __IpAddress_H__
#include <string>
class IpAddress
{
public:
enum Type
{
Unknown,
IpV4,
IpV6
};
~IpAddress(void);
/**
* \brief Gets the host address part of the IP address.
* \author Abi
* \date 02/06/2010
* \return The host address part of the IP address.
**/
const std::string& getHostAddress() const;
/**
* \brief Gets the port number part of the address if any.
* \author Abi
* \date 02/06/2010
* \return The port number.
**/
unsigned short getPortNumber() const;
/**
* \brief Gets the type of the IP address.
* \author Abi
* \date 02/06/2010
* \return The type.
**/
IpAddress::Type getType() const;
/**
* \fn static IpAddress Parse(const std::string& ip_address_str)
*
* \brief Parses a given string to an IP address.
* \author Abi
* \date 02/06/2010
* \param ip_address_str The ip address string to be parsed.
* \return Returns the parsed IP address. If the IP address is
* invalid then the IpAddress instance returned will have its
* type set to IpAddress::Unknown
**/
static IpAddress Parse(const std::string& ip_address_str);
/**
* \brief Converts the given type to string.
* \author Abi
* \date 02/06/2010
* \param address_type Type of the address to be converted to string.
* \return String form of the given address type.
**/
static std::string TypeToString(IpAddress::Type address_type);
private:
IpAddress(void);
Type m_type;
std::string m_hostAddress;
unsigned short m_portNumber;
};
#endif // __IpAddress_H__
The source file for the class, IpAddress.cpp
#include "IpAddress.h"
#include <boost/xpressive/xpressive.hpp>
namespace bxp = boost::xpressive;
static const std::string RegExIpV4_IpFormatHost = "^[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]+(\\:[0-9]{1,5})?$";
static const std::string RegExIpV4_StringHost = "^[A-Za-z0-9]+(\\:[0-9]+)?$";
IpAddress::IpAddress(void)
:m_type(Unknown)
,m_portNumber(0)
{
}
IpAddress::~IpAddress(void)
{
}
IpAddress IpAddress::Parse( const std::string& ip_address_str )
{
IpAddress ipaddress;
bxp::sregex ip_regex = bxp::sregex::compile(RegExIpV4_IpFormatHost);
bxp::sregex str_regex = bxp::sregex::compile(RegExIpV4_StringHost);
bxp::smatch match;
if (bxp::regex_match(ip_address_str, match, ip_regex) || bxp::regex_match(ip_address_str, match, str_regex))
{
ipaddress.m_type = IpV4;
// Anything before the last ':' (if any) is the host address
std::string::size_type colon_index = ip_address_str.find_last_of(':');
if (std::string::npos == colon_index)
{
ipaddress.m_portNumber = 0;
ipaddress.m_hostAddress = ip_address_str;
}else{
ipaddress.m_hostAddress = ip_address_str.substr(0, colon_index);
ipaddress.m_portNumber = atoi(ip_address_str.substr(colon_index+1).c_str());
}
}
return ipaddress;
}
std::string IpAddress::TypeToString( Type address_type )
{
std::string result = "Unknown";
switch(address_type)
{
case IpV4:
result = "IP Address Version 4";
break;
case IpV6:
result = "IP Address Version 6";
break;
}
return result;
}
const std::string& IpAddress::getHostAddress() const
{
return m_hostAddress;
}
unsigned short IpAddress::getPortNumber() const
{
return m_portNumber;
}
IpAddress::Type IpAddress::getType() const
{
return m_type;
}
I have only set the rules for IPv4 because I don't know the proper format for IPv6. But I'm pretty sure it's not hard to implement it. Boost Xpressive is just a template based solution and hence do not require any .lib files to be compiled into your exe, which I believe makes is a plus.
By the way just to break down the format of regex in a nutshell...
^ = start of string
$ = end of string
[] = a group of letters or digits that can appear
[0-9] = any single-digit between 0 and 9
[0-9]+ = one or more digits between 0 and 9
the '.' has a special meaning for regex but since our format has 1 dot in an ip-address format we need to specify that we want a '.' between digits by using '\.'. But since C++ needs an escape sequence for '\' we'll have to use "\\."
? = optional component
So, in short, "^[0-9]+$" represents a regex, which is true for an integer.
"^[0-9]+\.$" means an integer that ends with a '.'
"^[0-9]+\.[0-9]?$" is either an integer that ends with a '.' or a decimal.
For an integer or a real number, the regex would be "^[0-9]+(\.[0-9]*)?$".
RegEx an integer that is between 2 and 3 numbers is "^[0-9]{2,3}$".
Now to break down the format of the ip address:
"^[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]+(\\:[0-9]{1,5})?$"
This is synonymous to: "^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]+(\:[0-9]{1,5})?$", which means:
[start of string][1-3 digits].[1-3 digits].[1-3 digits].[1-3 digits]<:[1-5 digits]>[end of string]
Where, [] are mandatory and <> are optional
The second RegEx is simpler than this. It's just a combination of a alpha-numeric value followed by an optional colon and port-number.
By the way, if you would like to test out RegEx you can use this site.
Edit: I failed to notice that you optionally had http instead of port number. For that you can change the expression to:
"^[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]+(\\:([0-9]{1,5}|http|ftp|smtp))?$"
This accepts formats like:
127.0.0.1
127.0.0.1:3282
127.0.0.1:http
217.0.0.1:ftp
18.123.2.1:smtp
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With