I have two files A
-nodes_to_delete
and B
-nodes_to_keep
. Each file has a many lines with numeric ids.
I want to have the list of numeric ids that are in nodes_to_delete
but NOT in nodes_to_keep
, e.g. .
Doing it within a PostgreSQL database is unreasonably slow. Any neat way to do it in bash using Linux CLI tools?
UPDATE: This would seem to be a Pythonic job, but the files are really, really large. I have solved some similar problems using uniq
, sort
and some set theory techniques. This was about two or three orders of magnitude faster than the database equivalents.
Bash Conditional Expressions File comparison If you want to compare two files byte by byte, use the cmp utility. To produce a human-readable list of differences between text files, use the diff utility.
diff stands for difference. This command is used to display the differences in the files by comparing the files line by line.
The comm command does that.
Somebody showed me how to do exactly this in sh a couple months ago, and then I couldn't find it for a while... and while looking I stumbled onto your question. Here it is :
set_union () { sort $1 $2 | uniq } set_difference () { sort $1 $2 $2 | uniq -u } set_symmetric_difference() { sort $1 $2 | uniq -u }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With