Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete the line contains specific words/phrases with PHP

Tags:

text

php

I have a text file and I want to remove some lines that contain specific words

 <?php
// set source file name and path
$source = "problem.txt";

// read raw text as array
$raw = file($source) or die("Cannot read file");

now there's array from which I want to remove some lines and want to use them so on.

like image 740
Jimmy Avatar asked Feb 15 '10 17:02

Jimmy


2 Answers

As you have each line of your file in a row of an array, the array_filter function might interest you (quoting) :

array array_filter  ( array $input  [, callback $callback  ] )

Iterates over each value in the input array passing them to the callback function.
If the callback function returns true, the current value from input is returned into the result array. Array keys are preserved.

And you can use strpos or stripos to determine if a string is contained in another one.

For instance, let's suppose we have this array :

$arr = array(
  'this is a test',
  'glop test',
  'i like php',
  'a badword, glop is', 
);

We could define a callback function that would filter out lines containing "glop" :

function keep_no_glop($line) {
  if (strpos($line, 'glop') !== false) {
    return false;
  }
  return true;
}

And use that function with array_filter :

$arr_filtered = array_filter($arr, 'keep_no_glop');
var_dump($arr_filtered);

And we'd get this kind of output :

array
  0 => string 'this is a test' (length=14)
  2 => string 'i like php' (length=10)

i.e. we have removed all the lines containing the "badword" "glop".


Of course, now that you have the basic idea, nothing prevents you from using a more complex callback function ;-)


Edit after comments : here's a full portion of code that should work :

First of all, you have your list of lines :

$arr = array(
  'this is a test',
  'glop test',
  'i like php',
  'a badword, glop is', 
);

Then, you load the list of bad words from a file :
And you trim each line, and remove empty lines, to make sure you only end up with "words" in the $bad_words array, and not blank stuff that would cause troubles.

$bad_words = array_filter(array_map('trim', file('your_file_with_bad_words.txt')));
var_dump($bad_words);

The $bad_words array contains, from my test file :

array
  0 => string 'glop' (length=4)
  1 => string 'test' (length=4)

Then, the callback function, that loops over that array of bad words:

Note : using a global variable is not that nice :-( But the callback function called by array_filter doesn't get any other parameter, and I didn't want to load the file each time the callback function is called.

function keep_no_glop($line) {
  global $bad_words;
  foreach ($bad_words as $bad_word) {
      if (strpos($line, $bad_word) !== false) {
        return false;
      }
  }
  return true;
}

And, as before, you can use array_filter to filter the lines :

$arr_filtered = array_filter($arr, 'keep_no_glop');
var_dump($arr_filtered);

Which, this time, gives you :

array
  2 => string 'i like php' (length=10)
like image 188
Pascal MARTIN Avatar answered Oct 18 '22 09:10

Pascal MARTIN


This will remove all rows that have a blacklisted word in it:

$rows = file("problem.txt");    
$blacklist = "foo|bar|lol";

foreach($rows as $key => $row) {
    if(preg_match("/($blacklist)/", $row)) {
        unset($rows[$key]);
    }
}

file_put_contents("solved.txt", implode("\n", $rows));

Or, if you are using PHP 5.3, you can use a lambda function with array_filter:

$rows = file("problem.txt");    
$blacklist = "foo|bar|lol";
$rows = array_filter($rows, function($row) {
    return preg_match("/($blacklist)/", $row);
});

file_put_contents("solved.txt", implode("\n", $rows));

Prior to PHP 5.3, a solution using array_filter would actually use up more rows than the first solution I posted, so I'll leave that out.

like image 23
Tatu Ulmanen Avatar answered Oct 18 '22 09:10

Tatu Ulmanen