I have a PHP script which reads a large CSV and performs certain actions, but only if the "username" field is unique. The CSV is used in more than one script, so changing the input from the CSV to only contain unique usernames is not an option.
The very basic program flow (which I'm wondering about) goes like this:
$allUsernames = array();
while($row = fgetcsv($fp)) {
$username = $row[0];
if (in_array($username, $allUsernames)) continue;
$allUsernames[] = $username;
// process this row
}
Since this CSV could actually be quite large, it's that in_array
bit which has got me thinking. The most ideal situation when searching through an array for a member is if it is already sorted, so how would you build up an array from scratch, keeping it in order? Once it is in order, would there be a more efficient way to search it than using in_array()
, considering that it probably doesn't know the array is sorted?
From the php manual: Arrays are ordered. The order can be changed using various sorting functions.
php function sortArray() { $inputArray = array(8, 2, 7, 4, 5); $outArray = array(); for($x=1; $x<=100; $x++) { if (in_array($x, $inputArray)) { array_push($outArray, $x); } } return $outArray; } $sortArray = sortArray(); foreach ($sortArray as $value) { echo $value . "<br />"; } ?>
The rsort() function sorts an indexed array in descending order. Tip: Use the sort() function to sort an indexed array in ascending order.
Not keeping the array in order, but how about this kind of optimization? I'm guessing isset()
for an array key should be faster than in_array()
search.
$allUsernames = array();
while($row = fgetcsv($fp)) {
$username = $row[0];
if (isset($allUsernames[$username])) {
continue;
} else {
$allUsernames[$username] = true;
// do stuff
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With