Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using regular expressions with C++ on Unix

Tags:

c++

regex

unix

I'm familiar with Regex itself, but whenever I try to find any examples or documentation to use regex with Unix computers, I just get tutorials on how to write regex or how to use the .NET specific libraries available for Windows. I've been searching for a while and I can't find any good tutorials on C++ regex on Unix machines.

What I'm trying to do:

Parse a string using regex by breaking it up and then reading the different subgroups. To make a PHP analogy, something like preg_match that returns all $matches.

like image 665
stan Avatar asked Feb 08 '10 20:02

stan


People also ask

Can we use regular expression in Unix?

A regular expression(regex) is defined as a pattern that defines a class of strings. Given a string, we can then test if the string belongs to this class of patterns. Regular expressions are used by many of the unix utilities like grep, sed, awk, vi, emacs etc.

Can you use regular expressions in C?

A regular expression is a sequence of characters used to match a pattern to a string. The expression can be used for searching text and validating input. Remember, a regular expression is not the property of a particular language. POSIX is a well-known library used for regular expressions in C.

Does Linux support regular expressions?

Linux Regular Expressions are special characters which help search data and matching complex patterns. Regular expressions are shortened as 'regexp' or 'regex'. They are used in many Linux programs like grep, bash, rename, sed, etc.


2 Answers

Consider using Boost.Regex.

An example (from the website):

bool validate_card_format(const std::string& s)
{
   static const boost::regex e("(\\d{4}[- ]){3}\\d{4}");
   return regex_match(s, e);
}

Another example:

// match any format with the regular expression:
const boost::regex e("\\A(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\z");
const std::string machine_format("\\1\\2\\3\\4");
const std::string human_format("\\1-\\2-\\3-\\4");

std::string machine_readable_card_number(const std::string s)
{
   return regex_replace(s, e, machine_format, boost::match_default | boost::format_sed);
}

std::string human_readable_card_number(const std::string s)
{
   return regex_replace(s, e, human_format, boost::match_default | boost::format_sed);
}
like image 191
0xfe Avatar answered Oct 27 '22 19:10

0xfe


Look up the documentation for TR1 regexes or (almost equivalently) boost regex. Both work quite nicely on various Unix systems. The TR1 regex classes have been accepted into C++ 0x, so though they're not exactly part of the standard yet, they will be reasonably soon.

Edit: To break a string into subgroups, you can use an sregex_token_iterator. You can specify either what you want matched as tokens, or what you want matched as separators. Here's a quickie demo of both:

#include <iterator>
#include <regex>
#include <string>
#include <iostream>

int main() { 

    std::string line;

    std::cout << "Please enter some words: " << std::flush;
    std::getline(std::cin, line);

    std::tr1::regex r("[ .,:;\\t\\n]+");
    std::tr1::regex w("[A-Za-z]+");

    std::cout << "Matching words:\n";
    std::copy(std::tr1::sregex_token_iterator(line.begin(), line.end(), w),
        std::tr1::sregex_token_iterator(), 
        std::ostream_iterator<std::string>(std::cout, "\n"));

    std::cout << "\nMatching separators:\n";
    std::copy(std::tr1::sregex_token_iterator(line.begin(), line.end(), r, -1), 
        std::tr1::sregex_token_iterator(), 
        std::ostream_iterator<std::string>(std::cout, "\n"));

    return 0;
}

If you give it input like this: "This is some 999 text", the result is like this:

Matching words:
This
is
some
text

Matching separators:
This
is
some
999
text
like image 24
Jerry Coffin Avatar answered Oct 27 '22 20:10

Jerry Coffin