Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove entire sentence if it is containing string

Tags:

c++

regex

c++11

I need to remove the entire sentence from the string if it is containing a pattern. Here I have the pattern "Link" or "link", if it is present in the string, I need to remove the entire sentence containing it.

std::string subject = "This is previous sentence. This can be any sentences. Link 2.1.19.3 [Example]. This is can be any other sentence. This is next sentence.";   

std::string removeRedundantString(std::string subject)
{
    std::string removeSee = subject;
    std::smatch match;  

    std::regex redundantSee("(Link.*$)");

    if (std::regex_search(subject, match, redundantSee))
    {
        removeSee = std::regex_replace(subject, redundantSee, "");
    }
}

Expected Output :

This is previous sentence. This can be any sentences.This is can be any other sentence. This is next sentence.

Actual Output :

This is previous sentence. This can be any sentences.

The above actual output is coming because of regex used "(Link.*$)" which remove the sentences starting from Link to the end of the string. I am not able to figure out what regex is used to get the expected output. Here are the different test cases I need to test :

Testcase 1:

std::string subject = "Note this is second pattern, Ops that next the scheduler; link the amount for the full list of docs. The number of value varies from 0 to 4.";

Output: Note this is second pattern, Ops that next the scheduler;The number of value varies from 0 to 4.

Testcase 2:

std::string subject = "This is another pattern. (Link Doc::78::hello::Core::mount). Since this patern includes non-numeric value.";

Output : This is another pattern.Since this patern includes non-numeric value.

Any help would be appreciated.

like image 406
Ratnesh Avatar asked Oct 15 '22 21:10

Ratnesh


1 Answers

I'd recommend

std::regex redundantSee(R"(\W*\b[Ll]ink\b(?:\d+(?:\.\d+)*|[^.])*[.?!])")

See its online demo. Note the raw string literal syntax, R"(...)". The string pattern can be simply put inside instead of ... without any additional escaping.

Regex details:

  • \W* - zero or more non-word chars
  • \b - a word boundary
  • [Ll]ink - Link or link word
  • \b - a word boundary
  • (?:\d+(?:\.\d+)*|[^.])* - zero or more sequences of
    • \d+(?:\.\d+)* - one or more digits followed with zero or more sequences of . and one or more digits
    • | - or
    • [^.] - any char other than a .
  • [.?!] - a ?, . or !.
like image 136
Wiktor Stribiżew Avatar answered Oct 20 '22 17:10

Wiktor Stribiżew