How to remove the lines which appear on file B from another file A?

People also ask

How do I remove a common line from two files in UNIX?

To remove common lines between two files you can use grep , comm or join command. grep only works for small files. Use -v along with -f . This displays lines from file1 that do not match any line in file2 .

If the files are sorted (they are in your example):

comm -23 file1 file2

-23 suppresses the lines that are in both files, or only in file 2. If the files are not sorted, pipe them through sort first...

See the man page here

grep -Fvxf <lines-to-remove> <all-lines>

works on non-sorted files
maintains the order
is POSIX

Example:

cat <<EOF > A
b
1
a
0
01
b
1
EOF

cat <<EOF > B
0
1
EOF

grep -Fvxf B A

Output:

b
a
01
b

Explanation:

-F: use literal strings instead of the default BRE
-x: only consider matches that match the entire line
-v: print non-matching
-f file: take patterns from the given file

This method is slower on pre-sorted files than other methods, since it is more general. If speed matters as well, see: Fast way of finding lines in one file that are not in another?

Here's a quick bash automation for in-line operation:

remove-lines() (
  remove_lines="$1"
  all_lines="$2"
  tmp_file="$(mktemp)"
  grep -Fvxf "$remove_lines" "$all_lines" > "$tmp_file"
  mv "$tmp_file" "$all_lines"
)

GitHub upstream.

usage:

remove-lines lines-to-remove remove-from-this-file

See also: https://unix.stackexchange.com/questions/28158/is-there-a-tool-to-get-the-lines-in-one-file-that-are-not-in-another

awk to the rescue!

This solution doesn't require sorted inputs. You have to provide fileB first.

awk 'NR==FNR{a[$0];next} !($0 in a)' fileB fileA

returns

A
C

How does it work?

NR==FNR{a[$0];next} idiom is for storing the first file in an associative array as keys for a later "contains" test.

NR==FNR is checking whether we're scanning the first file, where the global line counter (NR) equals to the current file line counter (FNR).

a[$0] adds the current line to the associative array as key, note that this behaves like a set, where there won't be any duplicate values (keys)

!($0 in a) we're now in the next file(s), in is a contains test, here it's checking whether current line is in the set we populated in the first step from the first file, ! negates the condition. What is missing here is the action, which by default is {print} and usually not written explicitly.

Note that this can now be used to remove blacklisted words.

$ awk '...' badwords allwords > goodwords

with a slight change it can clean multiple lists and create cleaned versions.

$ awk 'NR==FNR{a[$0];next} !($0 in a){print > FILENAME".clean"}' bad file1 file2 file3 ...

Another way to do the same thing (also requires sorted input):

join -v 1 fileA fileB

In Bash, if the files are not pre-sorted:

join -v 1 <(sort fileA) <(sort fileB)

Related questions
                            
                                More elegant "ps aux | grep -v grep"
                            
                                gradlew command not found?
                            
                                How do I create a crontab through a script
                            
                                Understanding Linux /proc/pid/maps or /proc/self/maps
                            
                                How to create a file with a given size in Linux?
                            
                                Determining the path that a yum package installed to [closed]
                            
                                Make install, but not to default directories?
                            
                                Is there a way to only install the mysql client (Linux)? [closed]
                            
                                Can linux cat command be used for writing text to file?
                            
                                eval command in Bash and its typical uses
                            
                                CentOS 64 bit bad ELF interpreter
                            
                                What is the difference between buffer and cache memory in Linux?
                            
                                How to use sed to remove the last n lines of a file
                            
                                Viewing full output of PS command
                            
                                Why is printing to stdout so slow? Can it be sped up?
                            
                                "/usr/bin/ld: cannot find -lz"
                            
                                Retrieve CPU usage and memory usage of a single process on Linux?
                            
                                How do I force detach Screen from another SSH session?
                            
                                Get started with Latex on Linux [closed]
                            
                                Get yesterday's date in bash on Linux, DST-safe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to remove the lines which appear on file B from another file A?

Tags:

linux

grep

shell

diff

sed

People also ask

Recent Activity

Donate For Us