I have two files file1 and file2 Contents of file1 is <pre class="prettyprint"><code>Hello how are you when can i meet you film?? </code></pre> Contents of file2 is <pre class="prettyprint"><code>Hello how are you darling when can i meet you </code></pre> I want to generate a file which is a combination of two file like <pre class="prettyprint"><code>Hello how are you darling when can i meet you film?? </code></pre> Note: Space in the second line of <code>file1</code> should be ignored in the final file is there any inbuilt function in C or Linux to do the above following job or can a script be written to do this?

Easy job for <code>awk</code>: <pre class="prettyprint"><code>$ awk '{$1=$1}!u[$0]++' file2 file1 Hello how are you darling when can i meet you film?? </code></pre> <hr> Or if you don't care about the order of the output: <pre class="prettyprint"><code>$ sed 's/^\s*//' file1 file2 | sort -u are you darling film?? Hello how when can i meet you </code></pre>

Here's one way using <code>awk</code>: <pre class="prettyprint"><code>awk '{ gsub(/^[ \t]+|[ \t]+$/,"") } !a[$0]++' file2 file1 </code></pre> Results: <pre class="prettyprint"><code>Hello how are you darling when can i meet you film?? </code></pre> <hr> EDIT: The problem with: <pre class="prettyprint"><code>awk '{ $1=$1 } !a[$0]++' file2 file1 </code></pre> Is that, although it works well for this simple example, it will treat similar lines as the same thing because it not only removes leading and lagging whitespace, but it will also remove extra whitespace between fields. For example, if <code>file1</code> contains: <pre class="prettyprint"><code>Hello how are you when can i meet you film?? </code></pre> Both the: <pre class="prettyprint"><code>when can i meet you </code></pre> and: <pre class="prettyprint"><code>when can i meet you </code></pre> lines would be treated as the same thing. This may be the desired result, but based on your question, I think it's best to simply strip leading and lagging whitespace as per the first command. HTH.

Combine two files in linux without repetition

Tags:

c

linux

shell

awk

I have two files file1 and file2

Contents of file1 is

Hello
  how
are you
when can i meet you
film??

Contents of file2 is

Hello 
how 
are you
darling
when can i meet you

I want to generate a file which is a combination of two file like

Hello
how
are you
darling
when can i meet you
film??

Note: Space in the second line of file1 should be ignored in the final file is there any inbuilt function in C or Linux to do the above following job or can a script be written to do this?

813

asked Jan 09 '13 10:01

Manu

2 Answers

Easy job for awk:

$ awk '{$1=$1}!u[$0]++' file2 file1
Hello
how
are you
darling
when can i meet you
film??

Or if you don't care about the order of the output:

$ sed 's/^\s*//' file1 file2 | sort -u 
are you
darling
film??
Hello
how
when can i meet you

172

answered Sep 20 '22 20:09

Chris Seymour

Here's one way using awk:

awk '{ gsub(/^[ \t]+|[ \t]+$/,"") } !a[$0]++' file2 file1

Results:

Hello
how
are you
darling
when can i meet you
film??

EDIT:

The problem with:

awk '{ $1=$1 } !a[$0]++' file2 file1

Is that, although it works well for this simple example, it will treat similar lines as the same thing because it not only removes leading and lagging whitespace, but it will also remove extra whitespace between fields. For example, if file1 contains:

Hello
  how
are you
when  can i meet you
film??

Both the:

when can i meet you

and:

when  can i meet you

lines would be treated as the same thing. This may be the desired result, but based on your question, I think it's best to simply strip leading and lagging whitespace as per the first command. HTH.

answered Sep 19 '22 20:09

Steve

Related questions
                            
                                How can I add a thousands separator to a double in C on Windows?
                            
                                How to compare more than two numbers in parallel?
                            
                                cropping IplImage most effectively
                            
                                atomic compare(not equal) and swap
                            
                                Send multiple files via HTTP POST with libcurl
                            
                                "##" in printk, what does ## mean
                            
                                Elegant way to add/remove descriptors to/from poll
                            
                                C - Pros/Cons of Enum-Indexed Arrays [closed]
                            
                                freeing memory in Circular Doubly Linked List
                            
                                What does ovly_debug_event do in chrome?
                            
                                Open a fullscreen window on 2nd monitor with SFML?
                            
                                MD5 HMAC With OpenSSL
                            
                                kernel module: Accessing member of structure defined in another module header
                            
                                GPU code running slower than CPU version
                            
                                C++ can't find non-standard C functions in global namespace
                            
                                Postgres Async API detecting end of query
                            
                                using fwrite() to write a struct to a file
                            
                                How to accept socket with timeout
                            
                                What's the meaning of the `...` parameter in a C function parameter list [duplicate]
                            
                                Reading shared memory data using Java that is written by C++

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With