Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

bash, Linux: Set difference between two text files

I have two files A-nodes_to_delete and B-nodes_to_keep. Each file has a many lines with numeric ids.

I want to have the list of numeric ids that are in nodes_to_delete but NOT in nodes_to_keep, e.g. alt text .

Doing it within a PostgreSQL database is unreasonably slow. Any neat way to do it in bash using Linux CLI tools?

UPDATE: This would seem to be a Pythonic job, but the files are really, really large. I have solved some similar problems using uniq, sort and some set theory techniques. This was about two or three orders of magnitude faster than the database equivalents.

like image 955
Adam Matan Avatar asked Mar 24 '10 16:03

Adam Matan


People also ask

How do I compare the contents of two files in bash?

Bash Conditional Expressions File comparison If you want to compare two files byte by byte, use the cmp utility. To produce a human-readable list of differences between text files, use the diff utility.

Which command gives all differences between two files?

diff stands for difference. This command is used to display the differences in the files by comparing the files line by line.


2 Answers

The comm command does that.

like image 146
msw Avatar answered Sep 24 '22 23:09

msw


Somebody showed me how to do exactly this in sh a couple months ago, and then I couldn't find it for a while... and while looking I stumbled onto your question. Here it is :

set_union () {    sort $1 $2 | uniq }  set_difference () {    sort $1 $2 $2 | uniq -u }  set_symmetric_difference() {    sort $1 $2 | uniq -u } 
like image 40
slinkp Avatar answered Sep 22 '22 23:09

slinkp