Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find number, and remove adjacent characters equal to this number

Tags:

unix

sed

awk

perl

Part of my 4 column output looks like this:

5    cc1kcc1kc    5    cc1kcc1kc
5    cc2ppggg   5    cc2ppggg
6    ccg12qqqqqqqqqqqqggg    10 ccccg11qqqqqqqqqqqggggg 
3    4qqqqcgc1q   12    cgccgccgccgc

I only want the second and fourth columns changed, is there a way with awk/sed to remove the numbers with the characters next to them? Or would it be easier/better to use a perl script to perform this transformation?

The resulting output should look like this:

5    ccccc    5    ccccc
5    ccggg    5    ccggg
6    ccgggg   10    ccccgggggg 
3    cgc    12    cgccgccgccgc
like image 700
mmot Avatar asked Jun 25 '12 09:06

mmot


People also ask

How do you find duplicate characters in a given string?

To find the duplicate character from the string, we count the occurrence of each character in the string. If count is greater than 1, it implies that a character has a duplicate entry in the string. In above example, the characters highlighted in green are duplicate characters.


2 Answers

Taking the question literally, this removes the next n characters from fields 2 and 4 for any n embedded in the field.

perl -lane 'for $i (1, 3) {@nums = $F[$i] =~ /(\d+)/g; for $num (@nums) {$F[$i] =~ s/$num.{$num}//}}; print join("\t", @F)'

The other answers remove the number and all the characters that follow that are the same.

To illustrate the difference between my answer and the others, use the following input:

6    ccg8qqqqqqqqqqqqggg    10 ccccg3qqqqqqqqqqqggggg

My version outputs this:

6    ccgqqqqggg     10      ccccgqqqqqqqqggggg

while theirs output this:

6    ccgggg    10 ccccgggggg
like image 65
Dennis Williamson Avatar answered Nov 10 '22 04:11

Dennis Williamson


With perl:

perl -pe 's/\d+([^\d\s])\1*//g'
like image 37
Denis Ibaev Avatar answered Nov 10 '22 02:11

Denis Ibaev