Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sed to remove underscores and promote character

Tags:

c++

regex

sed

awk

I am trying to migrate some code from an old naming scheme to the new one the old naming scheme is:

int some_var_name;

New one is

int someVarName_:

So what I would ilke is some form of sed / regexy goodness to ease the process. So fundamentally what needs to happen is:
find lower case word with contained _ replace underscore with nothing and promote the char to the right of the _ to uppercase. After this appending an _ to the end of the match.

Is it possible to do this with Sed and/or Awk and regex? If not why not?

Any examples scripts would be appreciated.

thanks very much for any assistance.

EDIT:
For a bit of clarity the renaming is for a number of files that were written with the wrong naming convention and need to be brought into line with the rest of the codebase. It is not expected that this do a perfect replace that leaves everything in a compilable state. Rather the script will be run and then looked over by hand for any anomalies. The replace script would be purely to ease the burden of having to correct everything by hand, which i'm sure you would agree is considerably tedious.

like image 400
radman Avatar asked Jun 29 '10 00:06

radman


1 Answers

sed -re 's,[a-z]+(_[a-z]+)+,&_,g' -e 's,_([a-z]),\u\1,g'

Explanation:

This is a sed command with 2 expressions (each in quotes after a -e.) s,,,g is a global substitution. You usually see it with slashes instead of commas, but I think this is easier to read when you're using backslashes in the patterns (and no commas). The trailing g (for "global") means to apply this substitution to all matches on each line, rather than just the first.

The first expression will append an underscore to every token made up of a lowercase word ([a-z]+) followed by a nonzero number of lowercase words separated by underscores ((_[a-z]+)+). We replace this with &_, where & means "everything that matched", and _ is just a literal underscore. So in total, this expression is saying to add an underscore to the end of every underscore_separated_lowercase_token.

The second expression matches the pattern _([a-z])), where everything between ( and ) is a capturing group. This means we can refer back to it later as \1 (because it's the first capturing group. If there were more, they would be \2, \3, and so on.). So we're saying to match a lowercase letter following an underscore, and remember the letter.

We replace it with \u\1, which is the letter we just remembered, but made uppercase by that \u.

This code doesn't do anything clever to avoid munging #include lines or the like; it will replace every instance of a lowercase letter following an underscore with its uppercase equivalent.

like image 166
Vineet Avatar answered Sep 22 '22 22:09

Vineet