bash, Linux: Set difference between two text files

Tags:

I have two files A-nodes_to_delete and B-nodes_to_keep. Each file has a many lines with numeric ids.

I want to have the list of numeric ids that are in nodes_to_delete but NOT in nodes_to_keep, e.g. alt text .

Doing it within a PostgreSQL database is unreasonably slow. Any neat way to do it in bash using Linux CLI tools?

UPDATE: This would seem to be a Pythonic job, but the files are really, really large. I have solved some similar problems using uniq, sort and some set theory techniques. This was about two or three orders of magnitude faster than the database equivalents.

955

asked Mar 24 '10 16:03

Adam Matan

2 Answers

The comm command does that.

146

answered Sep 24 '22 23:09

msw

Somebody showed me how to do exactly this in sh a couple months ago, and then I couldn't find it for a while... and while looking I stumbled onto your question. Here it is :

set_union () {    sort $1 $2 | uniq }  set_difference () {    sort $1 $2 $2 | uniq -u }  set_symmetric_difference() {    sort $1 $2 | uniq -u }

answered Sep 22 '22 23:09

slinkp

Related questions
                            
                                How to Fix Permissions on Home-brew on MacOS High Sierra
                            
                                Shell/Bash shortcut for bulk renaming of files in a folder
                            
                                When setting IFS to split on newlines, why is it necessary to include a backspace?
                            
                                Test whether a directory exists inside a makefile
                            
                                Passing parameters to bash when executing a script fetched by curl
                            
                                What's the meaning of a ! before a command in the shell?
                            
                                What is the meaning of ${0%/*} in a bash script?
                            
                                Source files in a bash script
                            
                                Bash checking if string does not contain other string
                            
                                Bash script log file display to screen continuously
                            
                                Setting environment variables in Linux using Bash
                            
                                Is there any use for Bash scripting anymore? [closed]
                            
                                How do I add ~/bin to my path?
                            
                                How to remove (base) from terminal prompt after updating conda
                            
                                Rename multiple files, but only rename part of the filename in Bash
                            
                                store return value of a Python script in a bash script
                            
                                How to send control+c from a bash script?
                            
                                How to get script directory in POSIX sh?
                            
                                Create file with contents from shell script
                            
                                How to use Bash read with a timeout?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

bash, Linux: Set difference between two text files

Tags:

bash

file-io

set-difference

Adam Matan

People also ask

2 Answers

msw

slinkp

Recent Activity

Donate For Us