Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting and removing duplicate words in a line

Tags:

bash

sorting

People also ask

How do I remove repeating words from a string?

We create an empty hash table. Then split given string around spaces. For every word, we first check if it is in hash table or not. If not found in hash table, we print it and store in the hash table.

How do you sort and remove duplicate Data?

To filter for unique values, click Data > Sort & Filter > Advanced. To remove duplicate values, click Data > Data Tools > Remove Duplicates. To highlight unique or duplicate values, use the Conditional Formatting command in the Style group on the Home tab.


This works for me:

$ echo "zebra ant spider spider ant zebra ant" | xargs -n1 | sort -u | xargs
ant spider zebra

You can transform a list of words in a single row to a single column with xargs -n1 , use sort -u and transform back to a single row with xargs.


The shell was built to parse [:blank:] seperated word lists already. Therefore the use of xargs is completely redundant. The "unique" stuff can be done but its just easier to use sort.

echo $(printf '%s\n' zebra ant spider spider ant zebra ant | sort -u)


Use tr to change spaces to new lines, then sort, and finally change new lines back to spaces.

echo $(tr ' ' '\n' <<< "zebra ant spider spider ant zebra ant" | sort -u)

All of the answers prior to this one can only sort a single line at time. The following can be used to pipe a whole list of such lines into and it will print the sorted list of unique words for each line.

awk '{ delete a; for (i=1; i<=NF; i++) a[$i]++; n=asorti(a, b); for (i=1; i<=n; i++) printf b[i]" "; print "" }'

Thanks @jaypai for a lot of the syntax used in this.

Example:

>cat file
group label wearable edit_group edit_group_order label_max camera_elevation camera_distance name label_min label_max value_min value_max camera_angle camera_elevation id
id group label wearable edit_group clothing_morph value_min value_max name value_default clothing_morph group
id label show_simple wearable name edit_group edit_group_order group clothing_morph clothing_morph camera_distance label_min label_max value_min value_max camera_distance camera_angle
id group label wearable name edit_group clothing_morph value_min value_max value_default
group label wearable id clothing_morph edit_group edit_group_order label_min label_max value_min value_max name camera_distance camera_angle camera_elevation
id group label wearable edit_group name label_min label_max value_min value_max wearable
name id group wearable edit_group id group wearable id group wearable id group wearable value_min value_max

>cat file | awk '{ delete a; for (i=1; i<=NF; i++) a[$i]++; n=asorti(a, b); for (i=1; i<=n; i++) printf b[i]" "; print "" }'
camera_angle camera_distance camera_elevation edit_group edit_group_order group id label label_max label_min name value_max value_min wearable 
clothing_morph edit_group group id label name value_default value_max value_min wearable 
camera_angle camera_distance clothing_morph edit_group edit_group_order group id label label_max label_min name show_simple value_max value_min wearable 
clothing_morph edit_group group id label name value_default value_max value_min wearable 
camera_angle camera_distance camera_elevation clothing_morph edit_group edit_group_order group id label label_max label_min name value_max value_min wearable 
edit_group group id label label_max label_min name value_max value_min wearable 
edit_group group id name value_max value_min wearable

Use python

$ echo "zebra ant spider spider ant zebra ant" | python -c 'import sys; print(" ".join(sorted(set(sys.stdin.read().split()))))'
ant spider zebra

Using perl:

perl -lane '
  %a = map { $_ => 1 } @F;
  print join qq[ ], sort keys %a;
' <<< "zebra ant spider spider ant zebra ant"

Result:

ant spider zebra