Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace string pattern containing scores and underscores, problem with capturing

Tags:

regex

sed

Good day everyone!

I'm loosing my mind trying to use sed for replacing a string pattern. I have searched old threads about sed and escaping special characters, but I still can't get it done. I think my mind is now too deep into wanting to convolute the problem more than necessary too see the easy way.

I have a .tsv document, in which the second column represents tag-annotations that come in the form of these possibilities:

B-something
B-something-something
B-something_something
B-something-something_something
I-something
I-something-something
I-something_something
I-something-something_something

I need to change all the B-*s with B, and the same with the I-*s -> I.

I know I could make it in Python, but I need to learn sed for future quick pre-processing.

I played with regex101 and the pattern that seems to work is the following:

\b([BI]-[a-zA-Z_-]+)\b

Using sed, I could capture the first part, i.e. 'B-first_character' by using: sed /s/\([BI]-[a-zA-Z]\)/replacing_word/g' input > output

Nothing is replaced when I use: sed /s/\([BI]-\)\([a-zA-Z_-]+\)/replacing_word/g'

Probably the last piece of code is a horrible mistake in my mistakes, my mind is a bit blurry now. Sorry for the stupid topic and thanks all.

like image 951
cyberZamp Avatar asked Apr 30 '26 22:04

cyberZamp


1 Answers

The sed command is corrupt: you can't use / before the s substitution command here as you meant to just use it inside single quotes.

Also, + is a literal + in a BRE POSIX pattern. Use -E or replace + with \{1,\}.

To restore the captured value use a \NUMBER in the replacement pattern.

You may use

LC_ALL=C sed 's/\([BI]\)-[a-zA-Z_-]\{1,\}/\1/g' file

See the online demo.

The LC_ALL=C will make all character classes behave the same way as at regex101.com.

Pattern details

  • \([BI]\) - Group 1: B or I
  • - - a hyphen
  • [a-zA-Z_-]\{1,\} - one or more ASCII letters, _ or - chars.
like image 172
Wiktor Stribiżew Avatar answered May 04 '26 09:05

Wiktor Stribiżew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!