Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you filter out all unique lines in a file?

Is there a way to filter out all unique lines in a file via commandline tools without sorting the lines? I'd like to essentially do this:

sort -u myFile

without the performance hit of sorting.

like image 215
xdhmoore Avatar asked Apr 03 '13 20:04

xdhmoore


People also ask

How do I get unique lines in a file?

The uniq command finds the unique lines in a given input ( stdin or a filename command line argument) and either reports or removes the duplicated lines. This command only works with sorted data. Hence, uniq is often used with the sort command. To count how many times each of the lines appears in the file, ...

How do I delete duplicate lines in files?

Remove duplicate lines with uniq If you don't need to preserve the order of the lines in the file, using the sort and uniq commands will do what you need in a very straightforward way. The sort command sorts the lines in alphanumeric order. The uniq command ensures that sequential identical lines are reduced to one.

Which Unix command is used to output only the unique lines from a file?

The uniq command can count and print the number of repeated lines. Just like duplicate lines, we can filter unique lines (non-duplicate lines) as well and can also ignore case sensitivity.

How do I filter for unique values in Excel?

In Excel, there are several ways to filter for unique values—or remove duplicate values: To filter for unique values, click Data > Sort & Filter > Advanced. To remove duplicate values, click Data > Data Tools > Remove Duplicates.

How to filter out duplicates in a list and keep unique values?

Save 50% of your time, and reduce thousands of mouse clicks for you every day! You can apply the Advanced Filter feature to filter out duplicates in a list and only keep the unique values. Please do as follows. 1. Select the list you need to filter out duplicates, then click Data > Advanced. See screenshot: 2.

How many lines can a dropdown filter filter filter through?

The limitation is not on the number of lines Excel will filter through but on how many unique items it is placing in the dropdown filter. For example, if you have the numbers 1-20,000 in a column and add a filter to that column ... when you try to use the filter you will get...

How do I make a list only of Unique Records?

Click Data > Advanced (in the Sort & Filter group). Click Filter the list, in-place. Click Copy to another location. In the Copy to box, enter a cell reference. Alternatively, click Collapse Dialog to temporarily hide the popup window, select a cell on the worksheet, and then click Expand . Check the Unique records only, then click OK.


1 Answers

Remove duplicated lines:

awk '!a[$0]++' file

This is famous awk one-liner. there are many explanations on inet. Here is one explanation:

This one-liner is very idiomatic. It registers the lines seen in the associative-array "a" (arrays are always associative in Awk) and at the same time tests if it had seen the line before. If it had seen the line before, then a[line] > 0 and !a[line] == 0. Any expression that evaluates to false is a no-op, and any expression that evals to true is equal to "{ print }".

like image 197
Kent Avatar answered Nov 08 '22 23:11

Kent