Is there any way to extract the unique characters of each line? I know I can find the unique lines of a file using <pre class="prettyprint"><code>sort -u file </code></pre> I would like to determine the unique characters of each line (something like <code>sort -u</code> for each line). To clarify: given this input: <pre class="prettyprint"><code>111223234213 111111111111 123123123213 121212122212 </code></pre> I would like to get this output: <pre class="prettyprint"><code>1234 1 123 12 </code></pre>

Using sed <pre class="prettyprint"><code>sed ':;s/$.$$.*$\1/\1\2/;t' file </code></pre> Basically what it does is capture a character and check if it appears anywhere else on the line. It also captures all the characters between these. Then it replaces all of that including the second occurence with just first occurence and then what was inbetween. <code>t</code> is test and jumps to the <code>:</code> label if the previous command was successful. Then this repeats until the <code>s///</code> command fails meaning only unique characters remain. <code>;</code> just separates commands. <pre class="prettyprint"><code>1234 1 123 12 </code></pre> Keeps order as well.

It doesn't get things in the original order, but this awk one-liner seems to work: <pre class="prettyprint"><code>awk '{for(i=1;i<=length($0);i++){a[substr($0,i,1)]=1} for(i in a){printf("%s",i)} print "";delete a}' input.txt </code></pre> Split apart for easier reading, it could be stand-alone like this: <pre class="prettyprint"><code>#!/usr/bin/awk -f { # Step through the line, assigning each character as a key. # Repeated keys overwrite each other. for(i=1;i<=length($0);i++) { a[substr($0,i,1)]=1; } # Print items in the array. for(i in a) { printf("%s",i); } # Print a newline after we've gone through our items. print ""; # Get ready for the next line. delete a; } </code></pre> Of course, the same concept can be implemented pretty easily in pure bash as well: <pre class="prettyprint"><code>#!/usr/bin/env bash while read s; do declare -A a while [ -n "$s" ]; do a[${s:0:1}]=1 s=${s:1} done printf "%s" "${!a[@]}" echo "" unset a done < input.txt </code></pre> Note that this depends on bash 4, due to the associative array. And this one does get things in the original order, because bash does a better job of keeping array keys in order than awk. And I think you've got a solution using <code>sed</code> from Jose, though it has a bunch of extra pipe-fitting involved. :) The last tool you mentioned was <code>grep</code>. I'm pretty sure you can't do this in traditional grep, but perhaps some brave soul might be able to construct a perl-regexp variant (i.e. <code>grep -P</code>) using <code>-o</code> and lookarounds. They'd need more coffee than is in me right now though.

How can I find unique characters per line of input?

Tags:

grep

bash

sed

awk

Is there any way to extract the unique characters of each line?

I know I can find the unique lines of a file using

sort -u file

I would like to determine the unique characters of each line (something like sort -u for each line).

To clarify: given this input:

111223234213
111111111111
123123123213
121212122212

I would like to get this output:

218

asked Aug 21 '15 04:08

user1436187

2 Answers

Using sed

sed ':;s/\(.\)\(.*\)\1/\1\2/;t' file

Basically what it does is capture a character and check if it appears anywhere else on the line. It also captures all the characters between these. Then it replaces all of that including the second occurence with just first occurence and then what was inbetween.

t is test and jumps to the : label if the previous command was successful. Then this repeats until the s/// command fails meaning only unique characters remain.

; just separates commands.

Keeps order as well.

answered Oct 14 '22 08:10

123

It doesn't get things in the original order, but this awk one-liner seems to work:

awk '{for(i=1;i<=length($0);i++){a[substr($0,i,1)]=1} for(i in a){printf("%s",i)} print "";delete a}' input.txt

Split apart for easier reading, it could be stand-alone like this:

#!/usr/bin/awk -f

{
  # Step through the line, assigning each character as a key.
  # Repeated keys overwrite each other.
  for(i=1;i<=length($0);i++) {
    a[substr($0,i,1)]=1;
  }

  # Print items in the array.
  for(i in a) {
    printf("%s",i);
  }

  # Print a newline after we've gone through our items.
  print "";

  # Get ready for the next line.
  delete a;
}

Of course, the same concept can be implemented pretty easily in pure bash as well:

#!/usr/bin/env bash

while read s; do
  declare -A a
  while [ -n "$s" ]; do
    a[${s:0:1}]=1
    s=${s:1}
  done
  printf "%s" "${!a[@]}"
  echo ""
  unset a
done < input.txt

Note that this depends on bash 4, due to the associative array. And this one does get things in the original order, because bash does a better job of keeping array keys in order than awk.

And I think you've got a solution using sed from Jose, though it has a bunch of extra pipe-fitting involved. :)

The last tool you mentioned was grep. I'm pretty sure you can't do this in traditional grep, but perhaps some brave soul might be able to construct a perl-regexp variant (i.e. grep -P) using -o and lookarounds. They'd need more coffee than is in me right now though.

answered Oct 14 '22 06:10

ghoti

Related questions
                            
                                why find need "{} \"?
                            
                                How to source all files in a directory? [duplicate]
                            
                                Converting rsync --stats output to GB? [closed]
                            
                                Checking if length of array is equal to a variable in bash
                            
                                text navigation in jdb not working in bash
                            
                                How to fire a command when a shell script is interrupted?
                            
                                bash command to grep something on stderr and save the result in a file
                            
                                Meaning of colon in Bash after a double pipe
                            
                                how to convert a date HH: MM: SS in second with bash?
                            
                                Bash: print each input string in a new line
                            
                                Auto-Increment Build Number for Multiple Targets in Xcode
                            
                                Does $! mean something in shell scripting
                            
                                Bash script - determine vendor and install system (apt-get, yum etc)
                            
                                How to iterate through json in bash script
                            
                                List the first few lines of every file in a directory
                            
                                Are Unix/Linux pipes producer or consumer driven?
                            
                                install jre in a non interactive script
                            
                                Can I export PATH twice in .bashrc?
                            
                                Broken tab completion on make under linux
                            
                                How to store output from printf with formatting in a variable? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With