Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove Duplicate Lines in Text File

Tags:

arrays

file

php

I have a text file which I am trying to remove duplicate lines.

Text file example:

new featuredProduct('', '21640'), 
new featuredProduct('', '24664'), 
new featuredProduct('', '22142'), 
new featuredProduct('', '22142'), 
new featuredProduct('', '22142'), 
new featuredProduct('', '22142'), 
new featuredProduct('', '22142'), 

The PHP Code I've tried:

$lines = file('textfile.txt');
$lines = array_unique($lines);
file_put_contents('textfile.txt', implode($lines));

The PHP file is called duplicates.php and the textfile is in the same directory. I would like to be left with only:

new featuredProduct('', '21640'), 
new featuredProduct('', '24664'), 
new featuredProduct('', '22142'),  

The file function is trying to read the file into the $lines array then array_unique() to remove the duplicate entries. Then put the filtered results back in the same file.

like image 364
Hexana Avatar asked Jan 09 '23 08:01

Hexana


2 Answers

The problem is the new line characters at the end of each line. Because you don't have a new line character at the end of the last line it won't be the same as the others.

So just remove them when you read the file and then add then when you save the file again:

$lines = file('test.txt', FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
$lines = array_unique($lines);
file_put_contents('test.txt', implode(PHP_EOL, $lines));

If yo do: var_dump($lines); right after the file() call you will see it:

array(7) {
  [0]=>
  string(36) "new featuredProduct('', '21640'), 
"
  [1]=>
  string(36) "new featuredProduct('', '24664'), 
"
  [2]=>
  string(36) "new featuredProduct('', '22142'), 
"
  [3]=>
  string(36) "new featuredProduct('', '22142'), 
"
  [4]=>
  string(36) "new featuredProduct('', '22142'), 
"
  [5]=>
  string(36) "new featuredProduct('', '22142'), 
"
  [6]=>
  string(34) "new featuredProduct('', '22142'), "
       //^^ See here                            ^ And here
}
like image 141
Rizier123 Avatar answered Jan 10 '23 22:01

Rizier123


I know this question is about PHP and I don't know either you use Linux / Unix or Windows, but there is one really nice bash solution to get rid of duplicates that will work way faster for big files I think. You can even execute it from PHP with a system call:

awk '!a[$0]++' input.txt
like image 39
Axalix Avatar answered Jan 10 '23 21:01

Axalix