This is the code I'm using as I work my way to a solution.
public function indexAction()
{
//id3 options
$options = array("version" => 3.0, "encoding" => Zend_Media_Id3_Encoding::ISO88591, "compat" => true);
//path to collection
$path = APPLICATION_PATH . '/../public/Media/Music/';//Currently Approx 2000 files
//inner iterator
$dir = new RecursiveDirectoryIterator($path, RecursiveDirectoryIterator::SKIP_DOTS);
//iterator
$iterator = new RecursiveIteratorIterator($dir, RecursiveIteratorIterator::SELF_FIRST);
foreach ($iterator as $file) {
if (!$file->isDir() && $file->getExtension() === 'mp3') {
//real path to mp3 file
$filePath = $file->getRealPath();
Zend_Debug::dump($filePath);//current results: accepted path no errors
$id3 = new Zend_Media_Id3v2($filePath, $options);
foreach ($id3->getFramesByIdentifier("T*") as $frame) {
$data[$frame->identifier] = $frame->text;
}
Zend_Debug::dump($data);//currently can scan the whole collection without timing out, but APIC data not being processed.
}
}
}
The problem: Process a file system of mp3 files in multiple directories. Extract id3 tag data to a database (3 tables) and extract the cover image from the tag to a separate file.
I can handle the actual extraction and data handling. My issue is with output.
With the way that Zend Framework 1.x handles output buffering, outputting an indicator that the files are being processed is difficult. In an old style PHP script, without output buffering, you could print out a bit of html with every iteration of the loop and have some indication of progress.
I would like to be able to process each album's directory, output the results and then continue on to the next album's directory. Only requiring user intervention on certain errors.
Any help would be appreciated.
Javascript is not the solution I'm looking for. I feel that this should be possible within the constructs of PHP and a ZF 1 MVC.
I'm doing this mostly for my own enlightenment, it seems a very good way to learn some important concepts.
[EDIT]
Ok, how about some ideas on how to break this down into smaller chunks. Process one chunk, commit, process next chunk, kind of thing. In or out of ZF.
[EDIT]
I'm beginning to see the problem with what I'm trying to accomplish. It seems that output buffering is not just happening in ZF, it's happening everywhere from ZF all the way to the browser. Hmmmmm...
This is a typical example of what you should not do because
You are trying to parse ID3 tag
with PHP which is slow and trying to have multiple parse files at once would definitely make it even slower
RecursiveDirectoryIterator
would load all the files in a folder and sub folder from what i see there is no limit .. it can be 2,000
today the 100,000
the next day ? Total processing time is unpredictable and this can definitely take some hours in some cases
High dependence on single file system, with your current architecture the files are stored in local system so it would be difficult to split the files and do proper load balancing
You are not checking if the file information has been extracted before and this results Loop and extraction Duplication
No locking system
.. this means that this process can be initiated simultaneously resulting to general slow performance on the server
My advice is not to use loop
or RecursiveDirectoryIterator
to process the files in bulk.
Target the file as soon as they are uploaded or transferred to the server. That way you are only working with one file at a time this way to can spread the processing time.
Your problem is exactly what Job Queue are designed to do you are also not limited to implementing the parsing with PHP
.. you take advantage of C
or C++
for performance
Advantage
PHP
sever in C
Examples have tested
Expected Process Client
Expected Process Server
Finally this processing can be done on multiple servers in parallel
One solution would be to use a Job Queue, such a Gearman. Gearman is an excellent solution for this kind of problem, and easily integrated with Zend Framework (http://blog.digitalstruct.com/2010/10/17/integrating-gearman-into-zend-framework/)
It will allow you to create a worker to process each "chuck", allowing your process to continue unblocked while the job is processed, very handy for long running proceeses such as music/image processing etc http://gearman.org/index.php?id=getting_started
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With