Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unix command to find string set intersections or outliers?

Tags:

Is there a UNIX command on par with

sort | uniq 

to find string set intersections or "outliers".

An example application: I have a list of html templates, some of them have {% load i18n %} string inside, others don't. I want to know which files don't.

edit: grep -L solves above problem.

How about this:

file1:

mom dad bob 

file2:

dad 

%intersect file1 file2

dad 

%left-unique file1 file2

mom bob 
like image 736
Evgeny Avatar asked Jun 19 '09 03:06

Evgeny


People also ask

How find all occurrences of a string in Linux?

Grep is a Linux / Unix command-line tool used to search for a string of characters in a specified file. The text search pattern is called a regular expression. When it finds a match, it prints the line with the result. The grep command is handy when searching through large log files.

What does grep do Unix?

The grep command can search for a string in groups of files. When it finds a pattern that matches in more than one file, it prints the name of the file, followed by a colon, then the line matching the pattern.

How do I use find grep?

The grep command searches through the file, looking for matches to the pattern specified. To use it type grep , then the pattern we're searching for and finally the name of the file (or files) we're searching in. The output is the three lines in the file that contain the letters 'not'.


1 Answers

It appears that grep -L solves the real problem of the poster, but for the actual question asked, finding the intersection of two sets of strings, you might want to look into the "comm" command. For example, if file1 and file2 each contain a sorted list of words, one word per line, then

$ comm -12 file1 file2 

will produce the words common to both files. More generally, given sorted input files file1 and file2, the command

$ comm file1 file2 

produces three columns of output

  1. lines only in file1
  2. lines only in file2
  3. lines in both file1 and file2

You can suppress the column N in the output with the -N option. So, the command above, comm -12 file1 file2, suppresses columns 1 and 2, leaving only the words common to both files.

like image 149
Dale Hagglund Avatar answered Oct 23 '22 08:10

Dale Hagglund