Is there any way to extract the unique characters of each line?
I know I can find the unique lines of a file using
sort -u file
I would like to determine the unique characters of each line (something like sort -u
for each line).
To clarify: given this input:
111223234213
111111111111
123123123213
121212122212
I would like to get this output:
1234
1
123
12
Using the indexOf() method Invoke the indexOf() method on the String by passing the specified character as a parameter. If the return value of this method is not -1 then the String it indicates that contains the specified character.
To count the unique characters in a string, convert the string to a Set to remove all the duplicate characters and access the size property on the Set , e.g. new Set(str). size . The size property will return the number of unique characters in the string.
Approach: The given problem can be solved using the set data structure. The idea is to initialize an unordered set that stores all the distinct characters of the given string. The size of the set after the string is traversed is the required answer.
First we will initialize all values of counter array to 0 and all values of index array to n (length of string). On traversal of the string str and for every character c, increase count[x], if count[x] = 1, index[x] = i. If count[x] = 2, index[x] = n. Sort indexes and print characters.
Using sed
sed ':;s/\(.\)\(.*\)\1/\1\2/;t' file
Basically what it does is capture a character and check if it appears anywhere else on the line. It also captures all the characters between these. Then it replaces all of that including the second occurence with just first occurence and then what was inbetween.
t
is test and jumps to the :
label if the previous command was successful. Then this repeats until the s///
command fails meaning only unique characters remain.
;
just separates commands.
1234
1
123
12
Keeps order as well.
It doesn't get things in the original order, but this awk one-liner seems to work:
awk '{for(i=1;i<=length($0);i++){a[substr($0,i,1)]=1} for(i in a){printf("%s",i)} print "";delete a}' input.txt
Split apart for easier reading, it could be stand-alone like this:
#!/usr/bin/awk -f
{
# Step through the line, assigning each character as a key.
# Repeated keys overwrite each other.
for(i=1;i<=length($0);i++) {
a[substr($0,i,1)]=1;
}
# Print items in the array.
for(i in a) {
printf("%s",i);
}
# Print a newline after we've gone through our items.
print "";
# Get ready for the next line.
delete a;
}
Of course, the same concept can be implemented pretty easily in pure bash as well:
#!/usr/bin/env bash
while read s; do
declare -A a
while [ -n "$s" ]; do
a[${s:0:1}]=1
s=${s:1}
done
printf "%s" "${!a[@]}"
echo ""
unset a
done < input.txt
Note that this depends on bash 4, due to the associative array. And this one does get things in the original order, because bash does a better job of keeping array keys in order than awk.
And I think you've got a solution using sed
from Jose, though it has a bunch of extra pipe-fitting involved. :)
The last tool you mentioned was grep
. I'm pretty sure you can't do this in traditional grep, but perhaps some brave soul might be able to construct a perl-regexp variant (i.e. grep -P
) using -o
and lookarounds. They'd need more coffee than is in me right now though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With