Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I find unique characters per line of input?

Tags:

grep

bash

sed

awk

Is there any way to extract the unique characters of each line?

I know I can find the unique lines of a file using

sort -u file

I would like to determine the unique characters of each line (something like sort -u for each line).

To clarify: given this input:

111223234213
111111111111
123123123213
121212122212

I would like to get this output:

1234
1
123
12
like image 218
user1436187 Avatar asked Aug 21 '15 04:08

user1436187


People also ask

How do I find unique characters in a string?

Using the indexOf() method Invoke the indexOf() method on the String by passing the specified character as a parameter. If the return value of this method is not -1 then the String it indicates that contains the specified character.

How do you count the number of unique characters in a string?

To count the unique characters in a string, convert the string to a Set to remove all the duplicate characters and access the size property on the Set , e.g. new Set(str). size . The size property will return the number of unique characters in the string.

How do I get unique values from a string?

Approach: The given problem can be solved using the set data structure. The idea is to initialize an unordered set that stores all the distinct characters of the given string. The size of the set after the string is traversed is the required answer.

How do I find unique characters in a string in C++?

First we will initialize all values of counter array to 0 and all values of index array to n (length of string). On traversal of the string str and for every character c, increase count[x], if count[x] = 1, index[x] = i. If count[x] = 2, index[x] = n. Sort indexes and print characters.


2 Answers

Using sed

sed ':;s/\(.\)\(.*\)\1/\1\2/;t' file

Basically what it does is capture a character and check if it appears anywhere else on the line. It also captures all the characters between these. Then it replaces all of that including the second occurence with just first occurence and then what was inbetween.

t is test and jumps to the : label if the previous command was successful. Then this repeats until the s/// command fails meaning only unique characters remain.

; just separates commands.

1234
1
123
12

Keeps order as well.

like image 89
123 Avatar answered Oct 14 '22 08:10

123


It doesn't get things in the original order, but this awk one-liner seems to work:

awk '{for(i=1;i<=length($0);i++){a[substr($0,i,1)]=1} for(i in a){printf("%s",i)} print "";delete a}' input.txt

Split apart for easier reading, it could be stand-alone like this:

#!/usr/bin/awk -f

{
  # Step through the line, assigning each character as a key.
  # Repeated keys overwrite each other.
  for(i=1;i<=length($0);i++) {
    a[substr($0,i,1)]=1;
  }

  # Print items in the array.
  for(i in a) {
    printf("%s",i);
  }

  # Print a newline after we've gone through our items.
  print "";

  # Get ready for the next line.
  delete a;
}

Of course, the same concept can be implemented pretty easily in pure bash as well:

#!/usr/bin/env bash

while read s; do
  declare -A a
  while [ -n "$s" ]; do
    a[${s:0:1}]=1
    s=${s:1}
  done
  printf "%s" "${!a[@]}"
  echo ""
  unset a
done < input.txt

Note that this depends on bash 4, due to the associative array. And this one does get things in the original order, because bash does a better job of keeping array keys in order than awk.

And I think you've got a solution using sed from Jose, though it has a bunch of extra pipe-fitting involved. :)

The last tool you mentioned was grep. I'm pretty sure you can't do this in traditional grep, but perhaps some brave soul might be able to construct a perl-regexp variant (i.e. grep -P) using -o and lookarounds. They'd need more coffee than is in me right now though.

like image 28
ghoti Avatar answered Oct 14 '22 06:10

ghoti