Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex Replacing : to ":" etc

I've got a bunch of strings like:

"Hello, here's a test colon:. Here's a test semi-colon&#59;"

I would like to replace that with

"Hello, here's a test colon:. Here's a test semi-colon;"

And so on for all printable ASCII values.

At present I'm using boost::regex_search to match &#(\d+);, building up a string as I process each match in turn (including appending the substring containing no matches since the last match I found).

Can anyone think of a better way of doing it? I'm open to non-regex methods, but regex seemed a reasonably sensible approach in this case.

Thanks,

Dom

like image 289
Dominic Rodger Avatar asked Jan 09 '09 13:01

Dominic Rodger


People also ask

Can I replace with regex?

replace in JavaScript. To use RegEx, the first argument of replace will be replaced with regex syntax, for example /regex/ . This syntax serves as a pattern where any parts of the string that match it will be replaced with the new substring. The string 3foobar4 matches the regex /\d.

How do you replace special characters in regex?

To replace special characters, use replace() in JavaScript.

What is regex replace in SQL?

Replace all occurrences of a substring that match a regular expression with another substring. It is similar to the REPLACE function, except it uses a regular expression to select the substring to be replaced.


2 Answers

The big advantage of using a regex is to deal with the tricky cases like & Entity replacement isn't iterative, it's a single step. The regex is also going to be fairly efficient: the two lead characters are fixed, so it will quickly skip anything not starting with &#. Finally, the regex solution is one without a lot of surprises for future maintainers.

I'd say a regex was the right choice.

Is it the best regex, though? You know you need two digits and if you have 3 digits, the first one will be a 1. Printable ASCII is after all  -~. For that reason, you could consider &#1?\d\d;.

As for replacing the content, I'd use the basic algorithm described for boost::regex::replace :

For each match // Using regex_iterator<>
    Print the prefix of the match
    Remove the first 2 and last character of the match (&#;)
    lexical_cast the result to int, then truncate to char and append.

Print the suffix of the last match.
like image 165
MSalters Avatar answered Oct 11 '22 09:10

MSalters


This will probably earn me some down votes, seeing as this is not a c++, boost or regex response, but here's a SNOBOL solution. This one works for ASCII. Am working on something for Unicode.

        NUMS = '1234567890'
MAIN    LINE = INPUT                                :F(END)
SWAP    LINE ?  '&#' SPAN(NUMS) . N ';' = CHAR( N ) :S(SWAP)
        OUTPUT = LINE                               :(MAIN)
END
like image 31
bugmagnet Avatar answered Oct 11 '22 08:10

bugmagnet