Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove duplicate lines from file

I have a list of URLs, most of which are duplicates:

> http://example.com/some/a-test-link.html
> http://example.com/some/a-test-link.html
> http://example.com/some/another-link.html
> http://example.com/some/another-link.html
> http://example.com/some/again-link.html
> http://example.com/some/again-link.html

I don't need the same link twice, so I need to remove duplicates and keep only one link. How can I do this using regular expressions, or sed, or awk (I am not sure which technology would be best). I am using Ubuntu as the operating system and Sublime Text 3 as my editor.

like image 805
Tamim Ibrahim Avatar asked Mar 30 '26 16:03

Tamim Ibrahim


1 Answers

Very trivial using awk:

awk '!seen[$0]++' file

which basically means:

awk "!($0 in seen) {seen[$0];print}"

So if the line is not in the array it will add to it and print it. All subsequent lines if they exist in the array will be skipped.

$ cat file
> http://example.com/some/a-test-link.html
> http://example.com/some/a-test-link.html
> http://example.com/some/another-link.html
> http://example.com/some/another-link.html
> http://example.com/some/again-link.html
> http://example.com/some/again-link.html
$ awk '!seen[$0]++' file
> http://example.com/some/a-test-link.html
> http://example.com/some/another-link.html
> http://example.com/some/again-link.html
like image 88
jaypal singh Avatar answered Apr 02 '26 14:04

jaypal singh



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!