Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combine two files in linux without repetition

Tags:

c

linux

shell

awk

I have two files file1 and file2

Contents of file1 is

Hello
  how
are you
when can i meet you
film??

Contents of file2 is

Hello 
how 
are you
darling
when can i meet you

I want to generate a file which is a combination of two file like

Hello
how
are you
darling
when can i meet you
film??

Note: Space in the second line of file1 should be ignored in the final file is there any inbuilt function in C or Linux to do the above following job or can a script be written to do this?

like image 813
Manu Avatar asked Jan 09 '13 10:01

Manu


People also ask

How do I combine the contents of two files in Linux?

To merge lines of files, we use the paste command in the Linux system. The paste command is used to combine files horizontally by outputting lines consisting of the sequentially corresponding lines from each FILE, separated by TABs to the standard output.

How do I remove duplicate files in Linux?

To remove duplicate lines from a sorted file and make it unique, we use the uniq command in the Linux system. The uniq command work as a kind of filter program that reports out the duplicate lines in a file. It filters adjacent matching lines from the input and gives a unique output.

How do I merge two files together?

Open the two files you want to merge. Select all text (Command+A/Ctrl+A) from one document, then paste it into the new document (Command+V/Ctrl+V). Repeat steps for the second document. This will finish combining the text of both documents into one.

Which command is used to combine multiple files in Unix?

In Unix and Unix-like operating systems (such as Linux), you can use the tar command (short for "tape archiving") to combine multiple files into a single archive file for easy storage and/or distribution.


2 Answers

Easy job for awk:

$ awk '{$1=$1}!u[$0]++' file2 file1
Hello
how
are you
darling
when can i meet you
film??

Or if you don't care about the order of the output:

$ sed 's/^\s*//' file1 file2 | sort -u 
are you
darling
film??
Hello
how
when can i meet you
like image 172
Chris Seymour Avatar answered Sep 20 '22 20:09

Chris Seymour


Here's one way using awk:

awk '{ gsub(/^[ \t]+|[ \t]+$/,"") } !a[$0]++' file2 file1

Results:

Hello
how
are you
darling
when can i meet you
film??

EDIT:

The problem with:

awk '{ $1=$1 } !a[$0]++' file2 file1

Is that, although it works well for this simple example, it will treat similar lines as the same thing because it not only removes leading and lagging whitespace, but it will also remove extra whitespace between fields. For example, if file1 contains:

Hello
  how
are you
when  can i meet you
film??

Both the:

when can i meet you

and:

when  can i meet you

lines would be treated as the same thing. This may be the desired result, but based on your question, I think it's best to simply strip leading and lagging whitespace as per the first command. HTH.

like image 35
Steve Avatar answered Sep 19 '22 20:09

Steve